Re: [Gluster-users] self-heal not working

mabi Sun, 27 Aug 2017 12:16:23 -0700

Thanks Ravi for your analysis. So as far as I understand nothing to worry about 
but my question now would be: how do I get rid of this file from the heal info?


> -------- Original Message --------
> Subject: Re: [Gluster-users] self-heal not working
> Local Time: August 27, 2017 3:45 PM
> UTC Time: August 27, 2017 1:45 PM
> From: [email protected]
> To: mabi <[email protected]>
> Ben Turner <[email protected]>, Gluster Users <[email protected]>
>
> Yes, the shds did pick up the file for healing (I saw messages like " got 
> entry: 1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea") but no error afterwards.
>
> Anyway I reproduced it by manually setting the afr.dirty bit for a zero byte 
> file on all 3 bricks. Since there are no afr pending xattrs indicating 
> good/bad copies and all files are zero bytes, the data self-heal algorithm 
> just picks the file with the latest ctime as source. In your case that was 
> the arbiter brick. In the code, there is a check to prevent data heals if 
> arbiter is the source. So heal was not happening and the entries were not 
> removed from heal-info output.
>
> Perhaps we should add a check in the code to just remove the entries from 
> heal-info if size is zero bytes in all bricks.
>
> -Ravi
>
> On 08/25/2017 06:33 PM, mabi wrote:
>
>> Hi Ravi,
>>
>> Did you get a chance to have a look at the log files I have attached in my 
>> last mail?
>>
>> Best,
>> Mabi
>>
>>> -------- Original Message --------
>>> Subject: Re: [Gluster-users] self-heal not working
>>> Local Time: August 24, 2017 12:08 PM
>>> UTC Time: August 24, 2017 10:08 AM
>>> From: [email protected]
>>> To: Ravishankar N [<[email protected]>](mailto:[email protected])
>>> Ben Turner [<[email protected]>](mailto:[email protected]), Gluster Users 
>>> [<[email protected]>](mailto:[email protected])
>>>
>>> Thanks for confirming the command. I have now enabled DEBUG 
>>> client-log-level, run a heal and then attached the glustershd log files of 
>>> all 3 nodes in this mail.
>>>
>>> The volume concerned is called myvol-pro, the other 3 volumes have no 
>>> problem so far.
>>>
>>> Also note that in the mean time it looks like the file has been deleted by 
>>> the user and as such the heal info command does not show the file name 
>>> anymore but just is GFID which is:
>>>
>>> gfid:1985e233-d5ee-4e3e-a51a-cf0b5f9f2aea
>>>
>>> Hope that helps for debugging this issue.
>>>
>>>> -------- Original Message --------
>>>> Subject: Re: [Gluster-users] self-heal not working
>>>> Local Time: August 24, 2017 5:58 AM
>>>> UTC Time: August 24, 2017 3:58 AM
>>>> From: [email protected]
>>>> To: mabi [<[email protected]>](mailto:[email protected])
>>>> Ben Turner [<[email protected]>](mailto:[email protected]), Gluster 
>>>> Users [<[email protected]>](mailto:[email protected])
>>>>
>>>> Unlikely. In your case only the afr.dirty is set, not the 
>>>> afr.volname-client-xx xattr.
>>>>
>>>> `gluster volume set myvolume diagnostics.client-log-level DEBUG` is right.
>>>>
>>>> On 08/23/2017 10:31 PM, mabi wrote:
>>>>
>>>>> I just saw the following bug which was fixed in 3.8.15:
>>>>>
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1471613
>>>>>
>>>>> Is it possible that the problem I described in this post is related to 
>>>>> that bug?
>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>> Local Time: August 22, 2017 11:51 AM
>>>>>> UTC Time: August 22, 2017 9:51 AM
>>>>>> From: [email protected]
>>>>>> To: mabi [<[email protected]>](mailto:[email protected])
>>>>>> Ben Turner [<[email protected]>](mailto:[email protected]), Gluster 
>>>>>> Users [<[email protected]>](mailto:[email protected])
>>>>>>
>>>>>> On 08/22/2017 02:30 PM, mabi wrote:
>>>>>>
>>>>>>> Thanks for the additional hints, I have the following 2 questions first:
>>>>>>>
>>>>>>> - In order to launch the index heal is the following command correct:
>>>>>>> gluster volume heal myvolume
>>>>>>
>>>>>> Yes
>>>>>>
>>>>>>> - If I run a "volume start force" will it have any short disruptions on 
>>>>>>> my clients which mount the volume through FUSE? If yes, how long? This 
>>>>>>> is a production system that's why I am asking.
>>>>>>
>>>>>> No. You can actually create a test volume on  your personal linux box to 
>>>>>> try these kinds of things without needing multiple machines. This is how 
>>>>>> we develop and test our patches :)
>>>>>> 'gluster volume create testvol replica 3 /home/mabi/bricks/brick{1..3} 
>>>>>> force` and so on.
>>>>>>
>>>>>> HTH,
>>>>>> Ravi
>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>> Local Time: August 22, 2017 6:26 AM
>>>>>>>> UTC Time: August 22, 2017 4:26 AM
>>>>>>>> From: [email protected]
>>>>>>>> To: mabi [<[email protected]>](mailto:[email protected]), Ben Turner 
>>>>>>>> [<[email protected]>](mailto:[email protected])
>>>>>>>> Gluster Users 
>>>>>>>> [<[email protected]>](mailto:[email protected])
>>>>>>>>
>>>>>>>> Explore the following:
>>>>>>>>
>>>>>>>> - Launch index heal and look at the glustershd logs of all bricks for 
>>>>>>>> possible errors
>>>>>>>>
>>>>>>>> - See if the glustershd in each node is connected to all bricks.
>>>>>>>>
>>>>>>>> - If not try to restart shd by `volume start force`
>>>>>>>>
>>>>>>>> - Launch index heal again and try.
>>>>>>>>
>>>>>>>> - Try debugging the shd log by setting client-log-level to DEBUG 
>>>>>>>> temporarily.
>>>>>>>>
>>>>>>>> On 08/22/2017 03:19 AM, mabi wrote:
>>>>>>>>
>>>>>>>>> Sure, it doesn't look like a split brain based on the output:
>>>>>>>>>
>>>>>>>>> Brick node1.domain.tld:/data/myvolume/brick
>>>>>>>>> Status: Connected
>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>
>>>>>>>>> Brick node2.domain.tld:/data/myvolume/brick
>>>>>>>>> Status: Connected
>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>
>>>>>>>>> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>>> Status: Connected
>>>>>>>>> Number of entries in split-brain: 0
>>>>>>>>>
>>>>>>>>>> -------- Original Message --------
>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>> Local Time: August 21, 2017 11:35 PM
>>>>>>>>>> UTC Time: August 21, 2017 9:35 PM
>>>>>>>>>> From: [email protected]
>>>>>>>>>> To: mabi [<[email protected]>](mailto:[email protected])
>>>>>>>>>> Gluster Users 
>>>>>>>>>> [<[email protected]>](mailto:[email protected])
>>>>>>>>>>
>>>>>>>>>> Can you also provide:
>>>>>>>>>>
>>>>>>>>>> gluster v heal <my vol> info split-brain
>>>>>>>>>>
>>>>>>>>>> If it is split brain just delete the incorrect file from the brick 
>>>>>>>>>> and run heal again. I haven"t tried this with arbiter but I assume 
>>>>>>>>>> the process is the same.
>>>>>>>>>>
>>>>>>>>>> -b
>>>>>>>>>>
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> From: "mabi" [<[email protected]>](mailto:[email protected])
>>>>>>>>>>> To: "Ben Turner" [<[email protected]>](mailto:[email protected])
>>>>>>>>>>> Cc: "Gluster Users" 
>>>>>>>>>>> [<[email protected]>](mailto:[email protected])
>>>>>>>>>>> Sent: Monday, August 21, 2017 4:55:59 PM
>>>>>>>>>>> Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>>
>>>>>>>>>>> Hi Ben,
>>>>>>>>>>>
>>>>>>>>>>> So it is really a 0 kBytes file everywhere (all nodes including the 
>>>>>>>>>>> arbiter
>>>>>>>>>>> and from the client).
>>>>>>>>>>> Here below you will find the output you requested. Hopefully that 
>>>>>>>>>>> will help
>>>>>>>>>>> to find out why this specific file is not healing... Let me know if 
>>>>>>>>>>> you need
>>>>>>>>>>> any more information. Btw node3 is my arbiter node.
>>>>>>>>>>>
>>>>>>>>>>> NODE1:
>>>>>>>>>>>
>>>>>>>>>>> STAT:
>>>>>>>>>>> File:
>>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>>>>>>> Device: 24h/36d Inode: 10033884 Links: 2
>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>> Birth: -
>>>>>>>>>>>
>>>>>>>>>>> GETFATTR:
>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhuknAAlJAg==
>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOyo=
>>>>>>>>>>>
>>>>>>>>>>> NODE2:
>>>>>>>>>>>
>>>>>>>>>>> STAT:
>>>>>>>>>>> File:
>>>>>>>>>>> ‘/data/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png’
>>>>>>>>>>> Size: 0 Blocks: 38 IO Block: 131072 regular empty file
>>>>>>>>>>> Device: 26h/38d Inode: 10031330 Links: 2
>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>> Modify: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>>>>> Change: 2017-08-14 17:11:46.403704181 +0200
>>>>>>>>>>> Birth: -
>>>>>>>>>>>
>>>>>>>>>>> GETFATTR:
>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZhu6wAA8Hpw==
>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOVE=
>>>>>>>>>>>
>>>>>>>>>>> NODE3:
>>>>>>>>>>> STAT:
>>>>>>>>>>> File:
>>>>>>>>>>> /srv/glusterfs/myvolume/brick/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 4096 regular empty file
>>>>>>>>>>> Device: ca11h/51729d Inode: 405208959 Links: 2
>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>> Modify: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>> Change: 2017-08-14 17:11:46.604380051 +0200
>>>>>>>>>>> Birth: -
>>>>>>>>>>>
>>>>>>>>>>> GETFATTR:
>>>>>>>>>>> trusted.afr.dirty=0sAAAAAQAAAAAAAAAA
>>>>>>>>>>> trusted.bit-rot.version=0sAgAAAAAAAABZe6ejAAKPAg==
>>>>>>>>>>> trusted.gfid=0sGYXiM9XuTj6lGs8LX58q6g==
>>>>>>>>>>> trusted.glusterfs.d99af2fa-439b-4a21-bf3a-38f3849f87ec.xtime=0sWZG9sgAGOc4=
>>>>>>>>>>>
>>>>>>>>>>> CLIENT GLUSTER MOUNT:
>>>>>>>>>>> STAT:
>>>>>>>>>>> File:
>>>>>>>>>>> "/mnt/myvolume/data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png"
>>>>>>>>>>> Size: 0 Blocks: 0 IO Block: 131072 regular empty file
>>>>>>>>>>> Device: 1eh/30d Inode: 11897049013408443114 Links: 1
>>>>>>>>>>> Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
>>>>>>>>>>> Access: 2017-08-14 17:04:55.530681000 +0200
>>>>>>>>>>> Modify: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>> Change: 2017-08-14 17:11:46.407404779 +0200
>>>>>>>>>>> Birth: -
>>>>>>>>>>>
>>>>>>>>>>> > -------- Original Message --------
>>>>>>>>>>> > Subject: Re: [Gluster-users] self-heal not working
>>>>>>>>>>> > Local Time: August 21, 2017 9:34 PM
>>>>>>>>>>> > UTC Time: August 21, 2017 7:34 PM
>>>>>>>>>>> > From: [email protected]
>>>>>>>>>>> > To: mabi [<[email protected]>](mailto:[email protected])
>>>>>>>>>>> > Gluster Users 
>>>>>>>>>>> > [<[email protected]>](mailto:[email protected])
>>>>>>>>>>> >
>>>>>>>>>>> > ----- Original Message -----
>>>>>>>>>>> >> From: "mabi" [<[email protected]>](mailto:[email protected])
>>>>>>>>>>> >> To: "Gluster Users" 
>>>>>>>>>>> >> [<[email protected]>](mailto:[email protected])
>>>>>>>>>>> >> Sent: Monday, August 21, 2017 9:28:24 AM
>>>>>>>>>>> >> Subject: [Gluster-users] self-heal not working
>>>>>>>>>>> >>
>>>>>>>>>>> >> Hi,
>>>>>>>>>>> >>
>>>>>>>>>>> >> I have a replicat 2 with arbiter GlusterFS 3.8.11 cluster and 
>>>>>>>>>>> >> there is
>>>>>>>>>>> >> currently one file listed to be healed as you can see below but 
>>>>>>>>>>> >> never gets
>>>>>>>>>>> >> healed by the self-heal daemon:
>>>>>>>>>>> >>
>>>>>>>>>>> >> Brick node1.domain.tld:/data/myvolume/brick
>>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>> >> Status: Connected
>>>>>>>>>>> >> Number of entries: 1
>>>>>>>>>>> >>
>>>>>>>>>>> >> Brick node2.domain.tld:/data/myvolume/brick
>>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>> >> Status: Connected
>>>>>>>>>>> >> Number of entries: 1
>>>>>>>>>>> >>
>>>>>>>>>>> >> Brick node3.domain.tld:/srv/glusterfs/myvolume/brick
>>>>>>>>>>> >> /data/appdata_ocpom4nckwru/preview/1344699/64-64-crop.png
>>>>>>>>>>> >> Status: Connected
>>>>>>>>>>> >> Number of entries: 1
>>>>>>>>>>> >>
>>>>>>>>>>> >> As once recommended on this mailing list I have mounted that 
>>>>>>>>>>> >> glusterfs
>>>>>>>>>>> >> volume
>>>>>>>>>>> >> temporarily through fuse/glusterfs and ran a "stat" on that file 
>>>>>>>>>>> >> which is
>>>>>>>>>>> >> listed above but nothing happened.
>>>>>>>>>>> >>
>>>>>>>>>>> >> The file itself is available on all 3 nodes/bricks but on the 
>>>>>>>>>>> >> last node it
>>>>>>>>>>> >> has a different date. By the way this file is 0 kBytes big. Is 
>>>>>>>>>>> >> that maybe
>>>>>>>>>>> >> the reason why the self-heal does not work?
>>>>>>>>>>> >
>>>>>>>>>>> > Is the file actually 0 bytes or is it just 0 bytes on the 
>>>>>>>>>>> > arbiter(0 bytes
>>>>>>>>>>> > are expected on the arbiter, it just stores metadata)? Can you 
>>>>>>>>>>> > send us the
>>>>>>>>>>> > output from stat on all 3 nodes:
>>>>>>>>>>> >
>>>>>>>>>>> > $ stat <file on back end brick>
>>>>>>>>>>> > $ getfattr -d -m - <file on back end brick>
>>>>>>>>>>> > $ stat <file from gluster mount>
>>>>>>>>>>> >
>>>>>>>>>>> > Lets see what things look like on the back end, it should tell us 
>>>>>>>>>>> > why
>>>>>>>>>>> > healing is failing.
>>>>>>>>>>> >
>>>>>>>>>>> > -b
>>>>>>>>>>> >
>>>>>>>>>>> >>
>>>>>>>>>>> >> And how can I now make this file to heal?
>>>>>>>>>>> >>
>>>>>>>>>>> >> Thanks,
>>>>>>>>>>> >> Mabi
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >>
>>>>>>>>>>> >> _______________________________________________
>>>>>>>>>>> >> Gluster-users mailing list
>>>>>>>>>>> >> [email protected]
>>>>>>>>>>> >> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Gluster-users mailing list
>>>>>>>>> [email protected]
>>>>>>>>>
>>>>>>>>> http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] self-heal not working

Reply via email to