Re: [Gluster-users] Self healing does not see files to heal

Дмитрий Глушенок Wed, 17 Aug 2016 03:19:33 -0700

Unfortunately not:

Remount FS, then access test file from second client:


[root@srv02 ~]# umount /mnt
[root@srv02 ~]# mount -t glusterfs srv01:/test01 /mnt
[root@srv02 ~]# ls -l /mnt/passwd 
-rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd
[root@srv02 ~]# ls -l /R1/test01/
итого 4
-rw-r--r--. 2 root root 1505 авг 16 19:59 passwd
[root@srv02 ~]# 

Then remount FS and check if accessing the file from second node triggered 
self-heal on first node:

[root@srv01 ~]# umount /mnt
[root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt
[root@srv01 ~]# ls -l /mnt
итого 0
[root@srv01 ~]# ls -l /R1/test01/
итого 0
[root@srv01 ~]#

Nothing appeared.

[root@srv01 ~]# gluster volume info test01
 
Volume Name: test01
Type: Replicate
Volume ID: 2c227085-0b06-4804-805c-ea9c1bb11d8b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: srv01:/R1/test01
Brick2: srv02:/R1/test01
Options Reconfigured:
features.scrub-freq: hourly
features.scrub: Active
features.bitrot: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: on
[root@srv01 ~]# 

[root@srv01 ~]# gluster volume get test01 all | grep heal
cluster.background-self-heal-count      8                                       
cluster.metadata-self-heal              on                                      
cluster.data-self-heal                  on                                      
cluster.entry-self-heal                 on                                      
cluster.self-heal-daemon                on                                      
cluster.heal-timeout                    600                                     
cluster.self-heal-window-size           1                                       
cluster.data-self-heal-algorithm        (null)                                  
cluster.self-heal-readdir-size          1KB                                     
cluster.heal-wait-queue-length          128                                     
features.lock-heal                      off                                     
features.lock-heal                      off                                     
storage.health-check-interval           30                                      
features.ctr_lookupheal_link_timeout    300                                     
features.ctr_lookupheal_inode_timeout   300                                     
cluster.disperse-self-heal-daemon       enable                                  
disperse.background-heals               8                                       
disperse.heal-wait-qlength              128                                     
cluster.heal-timeout                    600                                     
cluster.granular-entry-heal             no                                      
[root@srv01 ~]#

--
Dmitry Glushenok
Jet Infosystems

> 17 авг. 2016 г., в 11:30, Ravishankar N <ravishan...@redhat.com> написал(а):
> 
> On 08/17/2016 01:48 PM, Дмитрий Глушенок wrote:
>> Hello Ravi,
>> 
>> Thank you for reply. Found bug number (for those who will google the email) 
>> https://bugzilla.redhat.com/show_bug.cgi?id=1112158 
>> <https://bugzilla.redhat.com/show_bug.cgi?id=1112158>
>> 
>> Accessing the removed file from mount-point is not always working because we 
>> have to find a special client which DHT will point to the brick with removed 
>> file. Otherwise the file will be accessed from good brick and self-healing 
>> will not happen (just verified). Or by accessing you meant something like 
>> touch?
> 
> Sorry should have been more explicit. I meant triggering a lookup on that 
> file with `stat filename`. I don't think you need a special client. DHT sends 
> the lookup to AFR which in turn sends to all its children. When one of them 
> returns ENOENT (because you removed it from the brick), AFR will 
> automatically trigger heal. I'm guessing it is not always working in your 
> case due to caching at various     levels and the lookup not coming till AFR. 
> If you do it from a fresh mount ,it should always work.
> -Ravi
> 
>> Dmitry Glushenok
>> Jet Infosystems
>> 
>>> 17 авг. 2016 г., в 4:24, Ravishankar N <ravishan...@redhat.com 
>>> <mailto:ravishan...@redhat.com>> написал(а):
>>> 
>>> On 08/16/2016 10:44 PM, Дмитрий Глушенок wrote:
>>>> Hello,
>>>> 
>>>> While testing healing after bitrot error it was found that self healing 
>>>> cannot heal files which were manually deleted from brick. Gluster 3.8.1:
>>>> 
>>>> - Create volume, mount it locally and copy test file to it
>>>> [root@srv01 ~]# gluster volume create test01 replica 2  srv01:/R1/test01 
>>>> srv02:/R1/test01
>>>> volume create: test01: success: please start the volume to access data
>>>> [root@srv01 ~]# gluster volume start test01
>>>> volume start: test01: success
>>>> [root@srv01 ~]# mount -t glusterfs srv01:/test01 /mnt
>>>> [root@srv01 ~]# cp /etc/passwd /mnt
>>>> [root@srv01 ~]# ls -l /mnt
>>>> итого 2
>>>> -rw-r--r--. 1 root root 1505 авг 16 19:59 passwd
>>>> 
>>>> - Then remove test file from first brick like we have to do in case of 
>>>> bitrot error in the file
>>> 
>>> You also need to remove all hard-links to the corrupted file from the 
>>> brick, including the one in the .glusterfs folder.
>>> There is a bug in heal-full that prevents it from crawling all bricks of 
>>> the replica. The right way to heal the corrupted files as of now is to 
>>> access them from the mount-point like you did after removing the 
>>> hard-links. The list of files that are corrupted can be obtained with the 
>>> scrub status command.
>>> 
>>> Hope this helps,
>>> Ravi
>>> 
>>>> [root@srv01 ~]# rm /R1/test01/passwd
>>>> [root@srv01 ~]# ls -l /mnt
>>>> итого 0
>>>> [root@srv01 ~]#
>>>> 
>>>> - Issue full self heal
>>>> [root@srv01 ~]# gluster volume heal test01 full
>>>> Launching heal operation to perform full self heal on volume test01 has 
>>>> been successful
>>>> Use heal info commands to check status
>>>> [root@srv01 ~]# tail -2 /var/log/glusterfs/glustershd.log
>>>> [2016-08-16 16:59:56.483767] I [MSGID: 108026] 
>>>> [afr-self-heald.c:611:afr_shd_full_healer] 0-test01-replicate-0: starting 
>>>> full sweep on subvol test01-client-0
>>>> [2016-08-16 16:59:56.486560] I [MSGID: 108026] 
>>>> [afr-self-heald.c:621:afr_shd_full_healer] 0-test01-replicate-0: finished 
>>>> full sweep on subvol test01-client-0
>>>> 
>>>> - Now we still see no files in mount point (it becomes empty right after 
>>>> removing file from the brick)
>>>> [root@srv01 ~]# ls -l /mnt
>>>> итого 0
>>>> [root@srv01 ~]#
>>>> 
>>>> - Then try to access file by using full name (lookup-optimize and 
>>>> readdir-optimize are turned off by default). Now glusterfs shows the file!
>>>> [root@srv01 ~]# ls -l /mnt/passwd
>>>> -rw-r--r--. 1 root root 1505 авг 16 19:59 /mnt/passwd
>>>> 
>>>> - And it reappeared in the brick
>>>> [root@srv01 ~]# ls -l /R1/test01/
>>>> итого 4
>>>> -rw-r--r--. 2 root root 1505 авг 16 19:59 passwd
>>>> [root@srv01 ~]#
>>>> 
>>>> Is it a bug or we can tell self heal to scan all files on all bricks in 
>>>> the volume?
>>>> 
>>>> --
>>>> Dmitry Glushenok
>>>> Jet Infosystems
>>>> 
>>>> _______________________________________________
>>>> Gluster-users mailing list
>>>> Gluster-users@gluster.org <mailto:Gluster-users@gluster.org>
>>>> http://www.gluster.org/mailman/listinfo/gluster-users 
>>>> <http://www.gluster.org/mailman/listinfo/gluster-users>
>

_______________________________________________
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Self healing does not see files to heal

Reply via email to