No, sorry, it's working fine. I may have missed some step because of which i saw that problem. /.shard is also healing fine now.
Let me know if it works for you. -Krutika On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhan...@redhat.com> wrote: > OK I just hit the other issue too, where .shard doesn't get healed. :) > > Investigating as to why that is the case. Give me some time. > > -Krutika > > On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhan...@redhat.com> > wrote: > >> Just figured the steps Anuradha has provided won't work if granular entry >> heal is on. >> So when you bring down a brick and create fake2 under / of the volume, >> granular entry heal feature causes >> sh to remember only the fact that 'fake2' needs to be recreated on the >> offline brick (because changelogs are granular). >> >> In this case, we would be required to indicate to self-heal-daemon that >> the entire directory tree from '/' needs to be repaired on the brick that >> contains no data. >> >> To fix this, I did the following (for users who use granular entry >> self-healing): >> >> 1. Kill the last brick process in the replica (/bricks/3) >> >> 2. [root@server-3 ~]# rm -rf /bricks/3 >> >> 3. [root@server-3 ~]# mkdir /bricks/3 >> >> 4. Create a new dir on the mount point: >> [root@client-1 ~]# mkdir /mnt/fake >> >> 5. Set some fake xattr on the root of the volume, and not the 'fake' >> directory itself. >> [root@client-1 ~]# setfattr -n "user.some-name" -v "some-value" /mnt >> >> 6. Make sure there's no io happening on your volume. >> >> 7. Check the pending xattrs on the brick directories of the two good >> copies (on bricks 1 and 2), you should be seeing same values as the one >> marked in red in both bricks. >> (note that the client-<num> xattr key will have the same last digit as >> the index of the brick that is down, when counting from 0. So if the first >> brick is the one that is down, it would read trusted.afr.*-client-0; if the >> second brick is the one that is empty and down, it would read >> trusted.afr.*-client-1 and so on). >> >> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 >> # file: 1 >> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >> 23a6574635f72756e74696d655f743a733000 >> trusted.afr.dirty=0x000000000000000000000000 >> *trusted.afr.rep-client-2=0x000000000000000100000001* >> trusted.gfid=0x00000000000000000000000000000001 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >> >> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 >> # file: 2 >> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >> 23a6574635f72756e74696d655f743a733000 >> trusted.afr.dirty=0x000000000000000000000000 >> *trusted.afr.rep-client-2=0x000**000000000000100000001* >> trusted.gfid=0x00000000000000000000000000000001 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >> >> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a 1. >> >> [root@server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v >> *0x000000010000000100000001* /bricks/1 >> [root@server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v >> *0x000000010000000100000001* /bricks/2 >> >> 9. Get the xattrs again and check the xattrs are set properly now >> >> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 >> # file: 1 >> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >> 23a6574635f72756e74696d655f743a733000 >> trusted.afr.dirty=0x000000000000000000000000 >> *trusted.afr.rep-client-2=0x000**000010000000100000001* >> trusted.gfid=0x00000000000000000000000000000001 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >> >> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 >> # file: 2 >> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >> 23a6574635f72756e74696d655f743a733000 >> trusted.afr.dirty=0x000000000000000000000000 >> *trusted.afr.rep-client-2=0x000**000010000000100000001* >> trusted.gfid=0x00000000000000000000000000000001 >> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >> >> 10. Force-start the volume. >> >> [root@server-1 ~]# gluster volume start rep force >> volume start: rep: success >> >> 11. Monitor heal-info command to ensure the number of entries keeps >> growing. >> >> 12. Keep monitoring with step 10 and eventually the number of entries >> needing heal must come down to 0. >> Also the checksums of the files on the previously empty brick should now >> match with the copies on the other two bricks. >> >> Could you check if the above steps work for you, in your test environment? >> >> You caught a nice bug in the manual steps to follow when granular >> entry-heal is enabled and an empty brick needs heal. Thanks for reporting >> it. :) We will fix the documentation appropriately. >> >> -Krutika >> >> >> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <kdhan...@redhat.com> >> wrote: >> >>> Tried this. >>> >>> With me, only 'fake2' gets healed after i bring the 'empty' brick back >>> up and it stops there unless I do a 'heal-full'. >>> >>> Is that what you're seeing as well? >>> >>> -Krutika >>> >>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage < >>> dgoss...@carouselchecks.com> wrote: >>> >>>> Same issue brought up glusterd on problem node heal count still stuck >>>> at 6330. >>>> >>>> Ran gluster v heal GUSTER1 full >>>> >>>> glustershd on problem node shows a sweep starting and finishing in >>>> seconds. Other 2 nodes show no activity in log. They should start a sweep >>>> too shouldn't they? >>>> >>>> Tried starting from scratch >>>> >>>> kill -15 brickpid >>>> rm -Rf /brick >>>> mkdir -p /brick >>>> mkdir mkdir /gsmount/fake2 >>>> setfattr -n "user.some-name" -v "some-value" /gsmount/fake2 >>>> >>>> Heals visible dirs instantly then stops. >>>> >>>> gluster v heal GLUSTER1 full >>>> >>>> see sweep star on problem node and end almost instantly. no files >>>> added t heal list no files healed no more logging >>>> >>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026] >>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>> starting full sweep on subvol GLUSTER1-client-1 >>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026] >>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>> finished full sweep on subvol GLUSTER1-client-1 >>>> >>>> same results no matter which node you run command on. Still stuck with >>>> 6330 files showing needing healed out of 19k. still showing in logs no >>>> heals are occuring. >>>> >>>> Is their a way to forcibly reset any prior heal data? Could it be >>>> stuck on some past failed heal start? >>>> >>>> >>>> >>>> >>>> *David Gossage* >>>> *Carousel Checks Inc. | System Administrator* >>>> *Office* 708.613.2284 >>>> >>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage < >>>> dgoss...@carouselchecks.com> wrote: >>>> >>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage < >>>>> dgoss...@carouselchecks.com> wrote: >>>>> >>>>>> updated test server to 3.8.3 >>>>>> >>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>> Options Reconfigured: >>>>>> cluster.granular-entry-heal: on >>>>>> performance.readdir-ahead: on >>>>>> performance.read-ahead: off >>>>>> nfs.disable: on >>>>>> nfs.addr-namelookup: off >>>>>> nfs.enable-ino32: off >>>>>> cluster.background-self-heal-count: 16 >>>>>> cluster.self-heal-window-size: 1024 >>>>>> performance.quick-read: off >>>>>> performance.io-cache: off >>>>>> performance.stat-prefetch: off >>>>>> cluster.eager-lock: enable >>>>>> network.remote-dio: on >>>>>> cluster.quorum-type: auto >>>>>> cluster.server-quorum-type: server >>>>>> storage.owner-gid: 36 >>>>>> storage.owner-uid: 36 >>>>>> server.allow-insecure: on >>>>>> features.shard: on >>>>>> features.shard-block-size: 64MB >>>>>> performance.strict-o-direct: off >>>>>> cluster.locking-scheme: granular >>>>>> >>>>>> kill -15 brickpid >>>>>> rm -Rf /gluster2/brick3 >>>>>> mkdir -p /gluster2/brick3/1 >>>>>> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10 >>>>>> \:_glustershard/fake2 >>>>>> setfattr -n "user.some-name" -v "some-value" >>>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2 >>>>>> gluster v start glustershard force >>>>>> >>>>>> at this point brick process starts and all visible files including >>>>>> new dir are made on brick >>>>>> handful of shards are in heal statistics still but no .shard >>>>>> directory created and no increase in shard count >>>>>> >>>>>> gluster v heal glustershard >>>>>> >>>>>> At this point still no increase in count or dir made no additional >>>>>> activity in logs for healing generated. waited few minutes tailing logs >>>>>> to >>>>>> check if anything kicked in. >>>>>> >>>>>> gluster v heal glustershard full >>>>>> >>>>>> gluster shards added to list and heal commences. logs show full >>>>>> sweep starting on all 3 nodes. though this time it only shows as >>>>>> finishing >>>>>> on one which looks to be the one that had brick deleted. >>>>>> >>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>> glustershard-client-0 >>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>> glustershard-client-1 >>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026] >>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>> glustershard-client-2 >>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026] >>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>> glustershard-client-2 >>>>>> >>>>> >>>>> Just realized its still healing so that may be why sweep on 2 other >>>>> bricks haven't replied as finished. >>>>> >>>>>> >>>>>> >>>>>> my hope is that later tonight a full heal will work on production. >>>>>> Is it possible self-heal daemon can get stale or stop listening but still >>>>>> show as active? Would stopping and starting self-heal daemon from >>>>>> gluster >>>>>> cli before doing these heals be helpful? >>>>>> >>>>>> >>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage < >>>>>> dgoss...@carouselchecks.com> wrote: >>>>>> >>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage < >>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>> >>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay < >>>>>>>> kdhan...@redhat.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay < >>>>>>>>> kdhan...@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage < >>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay < >>>>>>>>>>> kdhan...@redhat.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Could you also share the glustershd logs? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I'll get them when I get to work sure >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I tried the same steps that you mentioned multiple times, but >>>>>>>>>>>> heal is running to completion without any issues. >>>>>>>>>>>> >>>>>>>>>>>> It must be said that 'heal full' traverses the files and >>>>>>>>>>>> directories in a depth-first order and does heals also in the same >>>>>>>>>>>> order. >>>>>>>>>>>> But if it gets interrupted in the middle (say because >>>>>>>>>>>> self-heal-daemon was >>>>>>>>>>>> either intentionally or unintentionally brought offline and then >>>>>>>>>>>> brought >>>>>>>>>>>> back up), self-heal will only pick up the entries that are so far >>>>>>>>>>>> marked as >>>>>>>>>>>> new-entries that need heal which it will find in indices/xattrop >>>>>>>>>>>> directory. >>>>>>>>>>>> What this means is that those files and directories that were not >>>>>>>>>>>> visited >>>>>>>>>>>> during the crawl, will remain untouched and unhealed in this second >>>>>>>>>>>> iteration of heal, unless you execute a 'heal-full' again. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> So should it start healing shards as it crawls or not until >>>>>>>>>>> after it crawls the entire .shard directory? At the pace it was >>>>>>>>>>> going that >>>>>>>>>>> could be a week with one node appearing in the cluster but with no >>>>>>>>>>> shard >>>>>>>>>>> files if anything tries to access a file on that node. From my >>>>>>>>>>> experience >>>>>>>>>>> other day telling it to heal full again did nothing regardless of >>>>>>>>>>> node used. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> Crawl is started from '/' of the volume. Whenever self-heal >>>>>>>>> detects during the crawl that a file or directory is present in some >>>>>>>>> brick(s) and absent in others, it creates the file on the bricks >>>>>>>>> where it >>>>>>>>> is absent and marks the fact that the file or directory might need >>>>>>>>> data/entry and metadata heal too (this also means that an index is >>>>>>>>> created >>>>>>>>> under .glusterfs/indices/xattrop of the src bricks). And the >>>>>>>>> data/entry and >>>>>>>>> metadata heal are picked up and done in >>>>>>>>> >>>>>>>> the background with the help of these indices. >>>>>>>>> >>>>>>>> >>>>>>>> Looking at my 3rd node as example i find nearly an exact same >>>>>>>> number of files in xattrop dir as reported by heal count at time I >>>>>>>> brought >>>>>>>> down node2 to try and alleviate read io errors that seemed to occur >>>>>>>> from >>>>>>>> what I was guessing as attempts to use the node with no shards for >>>>>>>> reads. >>>>>>>> >>>>>>>> Also attached are the glustershd logs from the 3 nodes, along with >>>>>>>> the test node i tried yesterday with same results. >>>>>>>> >>>>>>> >>>>>>> Looking at my own logs I notice that a full sweep was only ever >>>>>>> recorded in glustershd.log on 2nd node with missing directory. I >>>>>>> believe I >>>>>>> should have found a sweep begun on every node correct? >>>>>>> >>>>>>> On my test dev when it did work I do see that >>>>>>> >>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>> glustershard-client-0 >>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>> glustershard-client-1 >>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>> glustershard-client-2 >>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>> glustershard-client-2 >>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>> glustershard-client-1 >>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>> glustershard-client-0 >>>>>>> >>>>>>> While when looking at past few days of the 3 prod nodes i only found >>>>>>> that on my 2nd node >>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> My suspicion is that this is what happened on your setup. Could >>>>>>>>>>>> you confirm if that was the case? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Brick was brought online with force start then a full heal >>>>>>>>>>> launched. Hours later after it became evident that it was not >>>>>>>>>>> adding new >>>>>>>>>>> files to heal I did try restarting self-heal daemon and relaunching >>>>>>>>>>> full >>>>>>>>>>> heal again. But this was after the heal had basically already >>>>>>>>>>> failed to >>>>>>>>>>> work as intended. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> OK. How did you figure it was not adding any new files? I need to >>>>>>>>>> know what places you were monitoring to come to this conclusion. >>>>>>>>>> >>>>>>>>>> -Krutika >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> As for those logs, I did manager to do something that caused >>>>>>>>>>>> these warning messages you shared earlier to appear in my client >>>>>>>>>>>> and server >>>>>>>>>>>> logs. >>>>>>>>>>>> Although these logs are annoying and a bit scary too, they >>>>>>>>>>>> didn't do any harm to the data in my volume. Why they appear just >>>>>>>>>>>> after a >>>>>>>>>>>> brick is replaced and under no other circumstances is something >>>>>>>>>>>> I'm still >>>>>>>>>>>> investigating. >>>>>>>>>>>> >>>>>>>>>>>> But for future, it would be good to follow the steps Anuradha >>>>>>>>>>>> gave as that would allow self-heal to at least detect that it has >>>>>>>>>>>> some >>>>>>>>>>>> repairing to do whenever it is restarted whether intentionally or >>>>>>>>>>>> otherwise. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I followed those steps as described on my test box and ended up >>>>>>>>>>> with exact same outcome of adding shards at an agonizing slow pace >>>>>>>>>>> and no >>>>>>>>>>> creation of .shard directory or heals on shard directory. >>>>>>>>>>> Directories >>>>>>>>>>> visible from mount healed quickly. This was with one VM so it has >>>>>>>>>>> only 800 >>>>>>>>>>> shards as well. After hours at work it had added a total of 33 >>>>>>>>>>> shards to >>>>>>>>>>> be healed. I sent those logs yesterday as well though not the >>>>>>>>>>> glustershd. >>>>>>>>>>> >>>>>>>>>>> Does replace-brick command copy files in same manner? For these >>>>>>>>>>> purposes I am contemplating just skipping the heal route. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -Krutika >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage < >>>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> attached brick and client logs from test machine where same >>>>>>>>>>>>> behavior occurred not sure if anything new is there. its still >>>>>>>>>>>>> on 3.8.2 >>>>>>>>>>>>> >>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>> Bricks: >>>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>> cluster.locking-scheme: granular >>>>>>>>>>>>> performance.strict-o-direct: off >>>>>>>>>>>>> features.shard-block-size: 64MB >>>>>>>>>>>>> features.shard: on >>>>>>>>>>>>> server.allow-insecure: on >>>>>>>>>>>>> storage.owner-uid: 36 >>>>>>>>>>>>> storage.owner-gid: 36 >>>>>>>>>>>>> cluster.server-quorum-type: server >>>>>>>>>>>>> cluster.quorum-type: auto >>>>>>>>>>>>> network.remote-dio: on >>>>>>>>>>>>> cluster.eager-lock: enable >>>>>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>>>>> performance.io-cache: off >>>>>>>>>>>>> performance.quick-read: off >>>>>>>>>>>>> cluster.self-heal-window-size: 1024 >>>>>>>>>>>>> cluster.background-self-heal-count: 16 >>>>>>>>>>>>> nfs.enable-ino32: off >>>>>>>>>>>>> nfs.addr-namelookup: off >>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>>>> cluster.granular-entry-heal: on >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage < >>>>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur < >>>>>>>>>>>>>> ata...@redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>>> > From: "David Gossage" <dgoss...@carouselchecks.com> >>>>>>>>>>>>>>> > To: "Anuradha Talur" <ata...@redhat.com> >>>>>>>>>>>>>>> > Cc: "gluster-users@gluster.org List" < >>>>>>>>>>>>>>> Gluster-users@gluster.org>, "Krutika Dhananjay" < >>>>>>>>>>>>>>> kdhan...@redhat.com> >>>>>>>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM >>>>>>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier >>>>>>>>>>>>>>> Slow >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur < >>>>>>>>>>>>>>> ata...@redhat.com> wrote: >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > > Response inline. >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > ----- Original Message ----- >>>>>>>>>>>>>>> > > > From: "Krutika Dhananjay" <kdhan...@redhat.com> >>>>>>>>>>>>>>> > > > To: "David Gossage" <dgoss...@carouselchecks.com> >>>>>>>>>>>>>>> > > > Cc: "gluster-users@gluster.org List" < >>>>>>>>>>>>>>> Gluster-users@gluster.org> >>>>>>>>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM >>>>>>>>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing >>>>>>>>>>>>>>> Glacier Slow >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > Could you attach both client and brick logs? Meanwhile >>>>>>>>>>>>>>> I will try these >>>>>>>>>>>>>>> > > steps >>>>>>>>>>>>>>> > > > out on my machines and see if it is easily recreatable. >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > -Krutika >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage < >>>>>>>>>>>>>>> > > dgoss...@carouselchecks.com >>>>>>>>>>>>>>> > > > > wrote: >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > Centos 7 Gluster 3.8.3 >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>> > > > Options Reconfigured: >>>>>>>>>>>>>>> > > > cluster.data-self-heal-algorithm: full >>>>>>>>>>>>>>> > > > cluster.self-heal-daemon: on >>>>>>>>>>>>>>> > > > cluster.locking-scheme: granular >>>>>>>>>>>>>>> > > > features.shard-block-size: 64MB >>>>>>>>>>>>>>> > > > features.shard: on >>>>>>>>>>>>>>> > > > performance.readdir-ahead: on >>>>>>>>>>>>>>> > > > storage.owner-uid: 36 >>>>>>>>>>>>>>> > > > storage.owner-gid: 36 >>>>>>>>>>>>>>> > > > performance.quick-read: off >>>>>>>>>>>>>>> > > > performance.read-ahead: off >>>>>>>>>>>>>>> > > > performance.io-cache: off >>>>>>>>>>>>>>> > > > performance.stat-prefetch: on >>>>>>>>>>>>>>> > > > cluster.eager-lock: enable >>>>>>>>>>>>>>> > > > network.remote-dio: enable >>>>>>>>>>>>>>> > > > cluster.quorum-type: auto >>>>>>>>>>>>>>> > > > cluster.server-quorum-type: server >>>>>>>>>>>>>>> > > > server.allow-insecure: on >>>>>>>>>>>>>>> > > > cluster.self-heal-window-size: 1024 >>>>>>>>>>>>>>> > > > cluster.background-self-heal-count: 16 >>>>>>>>>>>>>>> > > > performance.strict-write-ordering: off >>>>>>>>>>>>>>> > > > nfs.disable: on >>>>>>>>>>>>>>> > > > nfs.addr-namelookup: off >>>>>>>>>>>>>>> > > > nfs.enable-ino32: off >>>>>>>>>>>>>>> > > > cluster.granular-entry-heal: on >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no issues. >>>>>>>>>>>>>>> > > > Following steps detailed in previous recommendations >>>>>>>>>>>>>>> began proces of >>>>>>>>>>>>>>> > > > replacing and healngbricks one node at a time. >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > 1) kill pid of brick >>>>>>>>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10 >>>>>>>>>>>>>>> > > > 3) recreate directory of brick >>>>>>>>>>>>>>> > > > 4) gluster volume start <> force >>>>>>>>>>>>>>> > > > 5) gluster volume heal <> full >>>>>>>>>>>>>>> > > Hi, >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > I'd suggest that full heal is not used. There are a few >>>>>>>>>>>>>>> bugs in full heal. >>>>>>>>>>>>>>> > > Better safe than sorry ;) >>>>>>>>>>>>>>> > > Instead I'd suggest the following steps: >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > Currently I brought the node down by systemctl stop >>>>>>>>>>>>>>> glusterd as I was >>>>>>>>>>>>>>> > getting sporadic io issues and a few VM's paused so hoping >>>>>>>>>>>>>>> that will help. >>>>>>>>>>>>>>> > I may wait to do this till around 4PM when most work is >>>>>>>>>>>>>>> done in case it >>>>>>>>>>>>>>> > shoots load up. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > > 1) kill pid of brick >>>>>>>>>>>>>>> > > 2) to configuring of brick that you need >>>>>>>>>>>>>>> > > 3) recreate brick dir >>>>>>>>>>>>>>> > > 4) while the brick is still down, from the mount point: >>>>>>>>>>>>>>> > > a) create a dummy non existent dir under / of mount. >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > so if noee 2 is down brick, pick node for example 3 and >>>>>>>>>>>>>>> make a test dir >>>>>>>>>>>>>>> > under its brick directory that doesnt exist on 2 or should >>>>>>>>>>>>>>> I be dong this >>>>>>>>>>>>>>> > over a gluster mount? >>>>>>>>>>>>>>> You should be doing this over gluster mount. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > > b) set a non existent extended attribute on / of >>>>>>>>>>>>>>> mount. >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Could you give me an example of an attribute to set? >>>>>>>>>>>>>>> I've read a tad on >>>>>>>>>>>>>>> > this, and looked up attributes but haven't set any yet >>>>>>>>>>>>>>> myself. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" >>>>>>>>>>>>>>> <path-to-mount> >>>>>>>>>>>>>>> > Doing these steps will ensure that heal happens only from >>>>>>>>>>>>>>> updated brick to >>>>>>>>>>>>>>> > > down brick. >>>>>>>>>>>>>>> > > 5) gluster v start <> force >>>>>>>>>>>>>>> > > 6) gluster v heal <> >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> > Will it matter if somewhere in gluster the full heal >>>>>>>>>>>>>>> command was run other >>>>>>>>>>>>>>> > day? Not sure if it eventually stops or times out. >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> full heal will stop once the crawl is done. So if you want >>>>>>>>>>>>>>> to trigger heal again, >>>>>>>>>>>>>>> run gluster v heal <>. Actually even brick up or volume >>>>>>>>>>>>>>> start force should >>>>>>>>>>>>>>> trigger the heal. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Did this on test bed today. its one server with 3 bricks on >>>>>>>>>>>>>> same machine so take that for what its worth. also it still >>>>>>>>>>>>>> runs 3.8.2. >>>>>>>>>>>>>> Maybe ill update and re-run test. >>>>>>>>>>>>>> >>>>>>>>>>>>>> killed brick >>>>>>>>>>>>>> deleted brick dir >>>>>>>>>>>>>> recreated brick dir >>>>>>>>>>>>>> created fake dir on gluster mount >>>>>>>>>>>>>> set suggested fake attribute on it >>>>>>>>>>>>>> ran volume start <> force >>>>>>>>>>>>>> >>>>>>>>>>>>>> looked at files it said needed healing and it was just 8 >>>>>>>>>>>>>> shards that were modified for few minutes I ran through steps >>>>>>>>>>>>>> >>>>>>>>>>>>>> gave it few minutes and it stayed same >>>>>>>>>>>>>> ran gluster volume <> heal >>>>>>>>>>>>>> >>>>>>>>>>>>>> it healed all the directories and files you can see over >>>>>>>>>>>>>> mount including fakedir. >>>>>>>>>>>>>> >>>>>>>>>>>>>> same issue for shards though. it adds more shards to heal at >>>>>>>>>>>>>> glacier pace. slight jump in speed if I stat every file and dir >>>>>>>>>>>>>> in VM >>>>>>>>>>>>>> running but not all shards. >>>>>>>>>>>>>> >>>>>>>>>>>>>> It started with 8 shards to heal and is now only at 33 out of >>>>>>>>>>>>>> 800 and probably wont finish adding for few days at rate it goes. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB >>>>>>>>>>>>>>> data. Load was >>>>>>>>>>>>>>> > > little >>>>>>>>>>>>>>> > > > heavy but nothing shocking. >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > About an hour after node 1 finished I began same >>>>>>>>>>>>>>> process on node2. Heal >>>>>>>>>>>>>>> > > > proces kicked in as before and the files in >>>>>>>>>>>>>>> directories visible from >>>>>>>>>>>>>>> > > mount >>>>>>>>>>>>>>> > > > and .glusterfs healed in short time. Then it began >>>>>>>>>>>>>>> crawl of .shard adding >>>>>>>>>>>>>>> > > > those files to heal count at which point the entire >>>>>>>>>>>>>>> proces ground to a >>>>>>>>>>>>>>> > > halt >>>>>>>>>>>>>>> > > > basically. After 48 hours out of 19k shards it has >>>>>>>>>>>>>>> added 5900 to heal >>>>>>>>>>>>>>> > > list. >>>>>>>>>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested >>>>>>>>>>>>>>> to change this >>>>>>>>>>>>>>> > > value >>>>>>>>>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart >>>>>>>>>>>>>>> volume which I >>>>>>>>>>>>>>> > > did. No >>>>>>>>>>>>>>> > > > efffect. Tried relaunching heal no effect, despite any >>>>>>>>>>>>>>> node picked. I >>>>>>>>>>>>>>> > > > started each VM and performed a stat of all files from >>>>>>>>>>>>>>> within it, or a >>>>>>>>>>>>>>> > > full >>>>>>>>>>>>>>> > > > virus scan and that seemed to cause short small spikes >>>>>>>>>>>>>>> in shards added, >>>>>>>>>>>>>>> > > but >>>>>>>>>>>>>>> > > > not by much. Logs are showing no real messages >>>>>>>>>>>>>>> indicating anything is >>>>>>>>>>>>>>> > > going >>>>>>>>>>>>>>> > > > on. I get hits to brick log on occasion of null >>>>>>>>>>>>>>> lookups making me think >>>>>>>>>>>>>>> > > its >>>>>>>>>>>>>>> > > > not really crawling shards directory but waiting for a >>>>>>>>>>>>>>> shard lookup to >>>>>>>>>>>>>>> > > add >>>>>>>>>>>>>>> > > > it. I'll get following in brick log but not constant >>>>>>>>>>>>>>> and sometime >>>>>>>>>>>>>>> > > multiple >>>>>>>>>>>>>>> > > > for same shard. >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009] >>>>>>>>>>>>>>> > > > [server-resolve.c:569:server_resolve] >>>>>>>>>>>>>>> 0-GLUSTER1-server: no resolution >>>>>>>>>>>>>>> > > type >>>>>>>>>>>>>>> > > > for (null) (LOOKUP) >>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050] >>>>>>>>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] >>>>>>>>>>>>>>> 0-GLUSTER1-server: 12591783: >>>>>>>>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00 >>>>>>>>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) >>>>>>>>>>>>>>> ==> (Invalid >>>>>>>>>>>>>>> > > > argument) [Invalid argument] >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > This one repeated about 30 times in row then nothing >>>>>>>>>>>>>>> for 10 minutes then >>>>>>>>>>>>>>> > > one >>>>>>>>>>>>>>> > > > hit for one different shard by itself. >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > How can I determine if Heal is actually running? How >>>>>>>>>>>>>>> can I kill it or >>>>>>>>>>>>>>> > > force >>>>>>>>>>>>>>> > > > restart? Does node I start it from determine which >>>>>>>>>>>>>>> directory gets >>>>>>>>>>>>>>> > > crawled to >>>>>>>>>>>>>>> > > > determine heals? >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > David Gossage >>>>>>>>>>>>>>> > > > Carousel Checks Inc. | System Administrator >>>>>>>>>>>>>>> > > > Office 708.613.2284 >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>>> > > > Gluster-users@gluster.org >>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>>> > > > Gluster-users@gluster.org >>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > -- >>>>>>>>>>>>>>> > > Thanks, >>>>>>>>>>>>>>> > > Anuradha. >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Anuradha. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users