On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <[email protected]> wrote:
> No, sorry, it's working fine. I may have missed some step because of which > i saw that problem. /.shard is also healing fine now. > > Let me know if it works for you. > > -Krutika > > On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <[email protected]> > wrote: > >> OK I just hit the other issue too, where .shard doesn't get healed. :) >> >> Investigating as to why that is the case. Give me some time. >> >> -Krutika >> >> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <[email protected]> >> wrote: >> >>> Just figured the steps Anuradha has provided won't work if granular >>> entry heal is on. >>> So when you bring down a brick and create fake2 under / of the volume, >>> granular entry heal feature causes >>> sh to remember only the fact that 'fake2' needs to be recreated on the >>> offline brick (because changelogs are granular). >>> >>> In this case, we would be required to indicate to self-heal-daemon that >>> the entire directory tree from '/' needs to be repaired on the brick that >>> contains no data. >>> >>> To fix this, I did the following (for users who use granular entry >>> self-healing): >>> >>> 1. Kill the last brick process in the replica (/bricks/3) >>> >>> 2. [root@server-3 ~]# rm -rf /bricks/3 >>> >>> 3. [root@server-3 ~]# mkdir /bricks/3 >>> >>> 4. Create a new dir on the mount point: >>> [root@client-1 ~]# mkdir /mnt/fake >>> >>> 5. Set some fake xattr on the root of the volume, and not the 'fake' >>> directory itself. >>> [root@client-1 ~]# setfattr -n "user.some-name" -v "some-value" /mnt >>> >>> 6. Make sure there's no io happening on your volume. >>> >> I'll test this on dev today. But for my case in production this means I'll need to shut down every VM after work for this heal? Will the fact I have 6k files already listed as needing heals affect anything? > >>> 7. Check the pending xattrs on the brick directories of the two good >>> copies (on bricks 1 and 2), you should be seeing same values as the one >>> marked in red in both bricks. >>> (note that the client-<num> xattr key will have the same last digit as >>> the index of the brick that is down, when counting from 0. So if the first >>> brick is the one that is down, it would read trusted.afr.*-client-0; if the >>> second brick is the one that is empty and down, it would read >>> trusted.afr.*-client-1 and so on). >>> >>> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 >>> # file: 1 >>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>> 23a6574635f72756e74696d655f743a733000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> *trusted.afr.rep-client-2=0x000000000000000100000001* >>> trusted.gfid=0x00000000000000000000000000000001 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>> >>> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 >>> # file: 2 >>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>> 23a6574635f72756e74696d655f743a733000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> *trusted.afr.rep-client-2=0x000**000000000000100000001* >>> trusted.gfid=0x00000000000000000000000000000001 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>> >>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a 1. >>> >>> [root@server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v >>> *0x000000010000000100000001* /bricks/1 >>> [root@server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v >>> *0x000000010000000100000001* /bricks/2 >>> >>> 9. Get the xattrs again and check the xattrs are set properly now >>> >>> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 >>> # file: 1 >>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>> 23a6574635f72756e74696d655f743a733000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> *trusted.afr.rep-client-2=0x000**000010000000100000001* >>> trusted.gfid=0x00000000000000000000000000000001 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>> >>> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 >>> # file: 2 >>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>> 23a6574635f72756e74696d655f743a733000 >>> trusted.afr.dirty=0x000000000000000000000000 >>> *trusted.afr.rep-client-2=0x000**000010000000100000001* >>> trusted.gfid=0x00000000000000000000000000000001 >>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>> >>> 10. Force-start the volume. >>> >>> [root@server-1 ~]# gluster volume start rep force >>> volume start: rep: success >>> >>> 11. Monitor heal-info command to ensure the number of entries keeps >>> growing. >>> >>> 12. Keep monitoring with step 10 and eventually the number of entries >>> needing heal must come down to 0. >>> Also the checksums of the files on the previously empty brick should now >>> match with the copies on the other two bricks. >>> >>> Could you check if the above steps work for you, in your test >>> environment? >>> >>> You caught a nice bug in the manual steps to follow when granular >>> entry-heal is enabled and an empty brick needs heal. Thanks for reporting >>> it. :) We will fix the documentation appropriately. >>> >>> -Krutika >>> >>> >>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay <[email protected] >>> > wrote: >>> >>>> Tried this. >>>> >>>> With me, only 'fake2' gets healed after i bring the 'empty' brick back >>>> up and it stops there unless I do a 'heal-full'. >>>> >>>> Is that what you're seeing as well? >>>> >>>> -Krutika >>>> >>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage < >>>> [email protected]> wrote: >>>> >>>>> Same issue brought up glusterd on problem node heal count still stuck >>>>> at 6330. >>>>> >>>>> Ran gluster v heal GUSTER1 full >>>>> >>>>> glustershd on problem node shows a sweep starting and finishing in >>>>> seconds. Other 2 nodes show no activity in log. They should start a >>>>> sweep >>>>> too shouldn't they? >>>>> >>>>> Tried starting from scratch >>>>> >>>>> kill -15 brickpid >>>>> rm -Rf /brick >>>>> mkdir -p /brick >>>>> mkdir mkdir /gsmount/fake2 >>>>> setfattr -n "user.some-name" -v "some-value" /gsmount/fake2 >>>>> >>>>> Heals visible dirs instantly then stops. >>>>> >>>>> gluster v heal GLUSTER1 full >>>>> >>>>> see sweep star on problem node and end almost instantly. no files >>>>> added t heal list no files healed no more logging >>>>> >>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026] >>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026] >>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>> >>>>> same results no matter which node you run command on. Still stuck >>>>> with 6330 files showing needing healed out of 19k. still showing in logs >>>>> no heals are occuring. >>>>> >>>>> Is their a way to forcibly reset any prior heal data? Could it be >>>>> stuck on some past failed heal start? >>>>> >>>>> >>>>> >>>>> >>>>> *David Gossage* >>>>> *Carousel Checks Inc. | System Administrator* >>>>> *Office* 708.613.2284 >>>>> >>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage < >>>>> [email protected]> wrote: >>>>> >>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> updated test server to 3.8.3 >>>>>>> >>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>>> Options Reconfigured: >>>>>>> cluster.granular-entry-heal: on >>>>>>> performance.readdir-ahead: on >>>>>>> performance.read-ahead: off >>>>>>> nfs.disable: on >>>>>>> nfs.addr-namelookup: off >>>>>>> nfs.enable-ino32: off >>>>>>> cluster.background-self-heal-count: 16 >>>>>>> cluster.self-heal-window-size: 1024 >>>>>>> performance.quick-read: off >>>>>>> performance.io-cache: off >>>>>>> performance.stat-prefetch: off >>>>>>> cluster.eager-lock: enable >>>>>>> network.remote-dio: on >>>>>>> cluster.quorum-type: auto >>>>>>> cluster.server-quorum-type: server >>>>>>> storage.owner-gid: 36 >>>>>>> storage.owner-uid: 36 >>>>>>> server.allow-insecure: on >>>>>>> features.shard: on >>>>>>> features.shard-block-size: 64MB >>>>>>> performance.strict-o-direct: off >>>>>>> cluster.locking-scheme: granular >>>>>>> >>>>>>> kill -15 brickpid >>>>>>> rm -Rf /gluster2/brick3 >>>>>>> mkdir -p /gluster2/brick3/1 >>>>>>> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10 >>>>>>> \:_glustershard/fake2 >>>>>>> setfattr -n "user.some-name" -v "some-value" >>>>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard/fake2 >>>>>>> gluster v start glustershard force >>>>>>> >>>>>>> at this point brick process starts and all visible files including >>>>>>> new dir are made on brick >>>>>>> handful of shards are in heal statistics still but no .shard >>>>>>> directory created and no increase in shard count >>>>>>> >>>>>>> gluster v heal glustershard >>>>>>> >>>>>>> At this point still no increase in count or dir made no additional >>>>>>> activity in logs for healing generated. waited few minutes tailing >>>>>>> logs to >>>>>>> check if anything kicked in. >>>>>>> >>>>>>> gluster v heal glustershard full >>>>>>> >>>>>>> gluster shards added to list and heal commences. logs show full >>>>>>> sweep starting on all 3 nodes. though this time it only shows as >>>>>>> finishing >>>>>>> on one which looks to be the one that had brick deleted. >>>>>>> >>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>> glustershard-client-0 >>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>> glustershard-client-1 >>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>> glustershard-client-2 >>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026] >>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>> glustershard-client-2 >>>>>>> >>>>>> >>>>>> Just realized its still healing so that may be why sweep on 2 other >>>>>> bricks haven't replied as finished. >>>>>> >>>>>>> >>>>>>> >>>>>>> my hope is that later tonight a full heal will work on production. >>>>>>> Is it possible self-heal daemon can get stale or stop listening but >>>>>>> still >>>>>>> show as active? Would stopping and starting self-heal daemon from >>>>>>> gluster >>>>>>> cli before doing these heals be helpful? >>>>>>> >>>>>>> >>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Could you also share the glustershd logs? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I'll get them when I get to work sure >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I tried the same steps that you mentioned multiple times, but >>>>>>>>>>>>> heal is running to completion without any issues. >>>>>>>>>>>>> >>>>>>>>>>>>> It must be said that 'heal full' traverses the files and >>>>>>>>>>>>> directories in a depth-first order and does heals also in the >>>>>>>>>>>>> same order. >>>>>>>>>>>>> But if it gets interrupted in the middle (say because >>>>>>>>>>>>> self-heal-daemon was >>>>>>>>>>>>> either intentionally or unintentionally brought offline and then >>>>>>>>>>>>> brought >>>>>>>>>>>>> back up), self-heal will only pick up the entries that are so far >>>>>>>>>>>>> marked as >>>>>>>>>>>>> new-entries that need heal which it will find in indices/xattrop >>>>>>>>>>>>> directory. >>>>>>>>>>>>> What this means is that those files and directories that were not >>>>>>>>>>>>> visited >>>>>>>>>>>>> during the crawl, will remain untouched and unhealed in this >>>>>>>>>>>>> second >>>>>>>>>>>>> iteration of heal, unless you execute a 'heal-full' again. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> So should it start healing shards as it crawls or not until >>>>>>>>>>>> after it crawls the entire .shard directory? At the pace it was >>>>>>>>>>>> going that >>>>>>>>>>>> could be a week with one node appearing in the cluster but with no >>>>>>>>>>>> shard >>>>>>>>>>>> files if anything tries to access a file on that node. From my >>>>>>>>>>>> experience >>>>>>>>>>>> other day telling it to heal full again did nothing regardless of >>>>>>>>>>>> node used. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Crawl is started from '/' of the volume. Whenever self-heal >>>>>>>>>> detects during the crawl that a file or directory is present in some >>>>>>>>>> brick(s) and absent in others, it creates the file on the bricks >>>>>>>>>> where it >>>>>>>>>> is absent and marks the fact that the file or directory might need >>>>>>>>>> data/entry and metadata heal too (this also means that an index is >>>>>>>>>> created >>>>>>>>>> under .glusterfs/indices/xattrop of the src bricks). And the >>>>>>>>>> data/entry and >>>>>>>>>> metadata heal are picked up and done in >>>>>>>>>> >>>>>>>>> the background with the help of these indices. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Looking at my 3rd node as example i find nearly an exact same >>>>>>>>> number of files in xattrop dir as reported by heal count at time I >>>>>>>>> brought >>>>>>>>> down node2 to try and alleviate read io errors that seemed to occur >>>>>>>>> from >>>>>>>>> what I was guessing as attempts to use the node with no shards for >>>>>>>>> reads. >>>>>>>>> >>>>>>>>> Also attached are the glustershd logs from the 3 nodes, along with >>>>>>>>> the test node i tried yesterday with same results. >>>>>>>>> >>>>>>>> >>>>>>>> Looking at my own logs I notice that a full sweep was only ever >>>>>>>> recorded in glustershd.log on 2nd node with missing directory. I >>>>>>>> believe I >>>>>>>> should have found a sweep begun on every node correct? >>>>>>>> >>>>>>>> On my test dev when it did work I do see that >>>>>>>> >>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>> glustershard-client-0 >>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>> glustershard-client-1 >>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>> glustershard-client-2 >>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>> glustershard-client-2 >>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>> glustershard-client-1 >>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>> glustershard-client-0 >>>>>>>> >>>>>>>> While when looking at past few days of the 3 prod nodes i only >>>>>>>> found that on my 2nd node >>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>>> starting full sweep on subvol GLUSTER1-client-1 >>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026] >>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] 0-GLUSTER1-replicate-0: >>>>>>>> finished full sweep on subvol GLUSTER1-client-1 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> My suspicion is that this is what happened on your setup. >>>>>>>>>>>>> Could you confirm if that was the case? >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Brick was brought online with force start then a full heal >>>>>>>>>>>> launched. Hours later after it became evident that it was not >>>>>>>>>>>> adding new >>>>>>>>>>>> files to heal I did try restarting self-heal daemon and >>>>>>>>>>>> relaunching full >>>>>>>>>>>> heal again. But this was after the heal had basically already >>>>>>>>>>>> failed to >>>>>>>>>>>> work as intended. >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> OK. How did you figure it was not adding any new files? I need >>>>>>>>>>> to know what places you were monitoring to come to this conclusion. >>>>>>>>>>> >>>>>>>>>>> -Krutika >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> As for those logs, I did manager to do something that caused >>>>>>>>>>>>> these warning messages you shared earlier to appear in my client >>>>>>>>>>>>> and server >>>>>>>>>>>>> logs. >>>>>>>>>>>>> Although these logs are annoying and a bit scary too, they >>>>>>>>>>>>> didn't do any harm to the data in my volume. Why they appear just >>>>>>>>>>>>> after a >>>>>>>>>>>>> brick is replaced and under no other circumstances is something >>>>>>>>>>>>> I'm still >>>>>>>>>>>>> investigating. >>>>>>>>>>>>> >>>>>>>>>>>>> But for future, it would be good to follow the steps Anuradha >>>>>>>>>>>>> gave as that would allow self-heal to at least detect that it has >>>>>>>>>>>>> some >>>>>>>>>>>>> repairing to do whenever it is restarted whether intentionally or >>>>>>>>>>>>> otherwise. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I followed those steps as described on my test box and ended up >>>>>>>>>>>> with exact same outcome of adding shards at an agonizing slow pace >>>>>>>>>>>> and no >>>>>>>>>>>> creation of .shard directory or heals on shard directory. >>>>>>>>>>>> Directories >>>>>>>>>>>> visible from mount healed quickly. This was with one VM so it has >>>>>>>>>>>> only 800 >>>>>>>>>>>> shards as well. After hours at work it had added a total of 33 >>>>>>>>>>>> shards to >>>>>>>>>>>> be healed. I sent those logs yesterday as well though not the >>>>>>>>>>>> glustershd. >>>>>>>>>>>> >>>>>>>>>>>> Does replace-brick command copy files in same manner? For >>>>>>>>>>>> these purposes I am contemplating just skipping the heal route. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> -Krutika >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> attached brick and client logs from test machine where same >>>>>>>>>>>>>> behavior occurred not sure if anything new is there. its still >>>>>>>>>>>>>> on 3.8.2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>>> Bricks: >>>>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>>> cluster.locking-scheme: granular >>>>>>>>>>>>>> performance.strict-o-direct: off >>>>>>>>>>>>>> features.shard-block-size: 64MB >>>>>>>>>>>>>> features.shard: on >>>>>>>>>>>>>> server.allow-insecure: on >>>>>>>>>>>>>> storage.owner-uid: 36 >>>>>>>>>>>>>> storage.owner-gid: 36 >>>>>>>>>>>>>> cluster.server-quorum-type: server >>>>>>>>>>>>>> cluster.quorum-type: auto >>>>>>>>>>>>>> network.remote-dio: on >>>>>>>>>>>>>> cluster.eager-lock: enable >>>>>>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>>>>>> performance.io-cache: off >>>>>>>>>>>>>> performance.quick-read: off >>>>>>>>>>>>>> cluster.self-heal-window-size: 1024 >>>>>>>>>>>>>> cluster.background-self-heal-count: 16 >>>>>>>>>>>>>> nfs.enable-ino32: off >>>>>>>>>>>>>> nfs.addr-namelookup: off >>>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>>>>> cluster.granular-entry-heal: on >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>>>> > From: "David Gossage" <[email protected]> >>>>>>>>>>>>>>>> > To: "Anuradha Talur" <[email protected]> >>>>>>>>>>>>>>>> > Cc: "[email protected] List" < >>>>>>>>>>>>>>>> [email protected]>, "Krutika Dhananjay" < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM >>>>>>>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing Glacier >>>>>>>>>>>>>>>> Slow >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > > Response inline. >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > ----- Original Message ----- >>>>>>>>>>>>>>>> > > > From: "Krutika Dhananjay" <[email protected]> >>>>>>>>>>>>>>>> > > > To: "David Gossage" <[email protected]> >>>>>>>>>>>>>>>> > > > Cc: "[email protected] List" < >>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM >>>>>>>>>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing >>>>>>>>>>>>>>>> Glacier Slow >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > Could you attach both client and brick logs? >>>>>>>>>>>>>>>> Meanwhile I will try these >>>>>>>>>>>>>>>> > > steps >>>>>>>>>>>>>>>> > > > out on my machines and see if it is easily >>>>>>>>>>>>>>>> recreatable. >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > -Krutika >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage < >>>>>>>>>>>>>>>> > > [email protected] >>>>>>>>>>>>>>>> > > > > wrote: >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > Centos 7 Gluster 3.8.3 >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>> > > > Options Reconfigured: >>>>>>>>>>>>>>>> > > > cluster.data-self-heal-algorithm: full >>>>>>>>>>>>>>>> > > > cluster.self-heal-daemon: on >>>>>>>>>>>>>>>> > > > cluster.locking-scheme: granular >>>>>>>>>>>>>>>> > > > features.shard-block-size: 64MB >>>>>>>>>>>>>>>> > > > features.shard: on >>>>>>>>>>>>>>>> > > > performance.readdir-ahead: on >>>>>>>>>>>>>>>> > > > storage.owner-uid: 36 >>>>>>>>>>>>>>>> > > > storage.owner-gid: 36 >>>>>>>>>>>>>>>> > > > performance.quick-read: off >>>>>>>>>>>>>>>> > > > performance.read-ahead: off >>>>>>>>>>>>>>>> > > > performance.io-cache: off >>>>>>>>>>>>>>>> > > > performance.stat-prefetch: on >>>>>>>>>>>>>>>> > > > cluster.eager-lock: enable >>>>>>>>>>>>>>>> > > > network.remote-dio: enable >>>>>>>>>>>>>>>> > > > cluster.quorum-type: auto >>>>>>>>>>>>>>>> > > > cluster.server-quorum-type: server >>>>>>>>>>>>>>>> > > > server.allow-insecure: on >>>>>>>>>>>>>>>> > > > cluster.self-heal-window-size: 1024 >>>>>>>>>>>>>>>> > > > cluster.background-self-heal-count: 16 >>>>>>>>>>>>>>>> > > > performance.strict-write-ordering: off >>>>>>>>>>>>>>>> > > > nfs.disable: on >>>>>>>>>>>>>>>> > > > nfs.addr-namelookup: off >>>>>>>>>>>>>>>> > > > nfs.enable-ino32: off >>>>>>>>>>>>>>>> > > > cluster.granular-entry-heal: on >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no >>>>>>>>>>>>>>>> issues. >>>>>>>>>>>>>>>> > > > Following steps detailed in previous recommendations >>>>>>>>>>>>>>>> began proces of >>>>>>>>>>>>>>>> > > > replacing and healngbricks one node at a time. >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > 1) kill pid of brick >>>>>>>>>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10 >>>>>>>>>>>>>>>> > > > 3) recreate directory of brick >>>>>>>>>>>>>>>> > > > 4) gluster volume start <> force >>>>>>>>>>>>>>>> > > > 5) gluster volume heal <> full >>>>>>>>>>>>>>>> > > Hi, >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > I'd suggest that full heal is not used. There are a few >>>>>>>>>>>>>>>> bugs in full heal. >>>>>>>>>>>>>>>> > > Better safe than sorry ;) >>>>>>>>>>>>>>>> > > Instead I'd suggest the following steps: >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > Currently I brought the node down by systemctl stop >>>>>>>>>>>>>>>> glusterd as I was >>>>>>>>>>>>>>>> > getting sporadic io issues and a few VM's paused so >>>>>>>>>>>>>>>> hoping that will help. >>>>>>>>>>>>>>>> > I may wait to do this till around 4PM when most work is >>>>>>>>>>>>>>>> done in case it >>>>>>>>>>>>>>>> > shoots load up. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > > 1) kill pid of brick >>>>>>>>>>>>>>>> > > 2) to configuring of brick that you need >>>>>>>>>>>>>>>> > > 3) recreate brick dir >>>>>>>>>>>>>>>> > > 4) while the brick is still down, from the mount point: >>>>>>>>>>>>>>>> > > a) create a dummy non existent dir under / of mount. >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > so if noee 2 is down brick, pick node for example 3 and >>>>>>>>>>>>>>>> make a test dir >>>>>>>>>>>>>>>> > under its brick directory that doesnt exist on 2 or >>>>>>>>>>>>>>>> should I be dong this >>>>>>>>>>>>>>>> > over a gluster mount? >>>>>>>>>>>>>>>> You should be doing this over gluster mount. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > > b) set a non existent extended attribute on / of >>>>>>>>>>>>>>>> mount. >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Could you give me an example of an attribute to set? >>>>>>>>>>>>>>>> I've read a tad on >>>>>>>>>>>>>>>> > this, and looked up attributes but haven't set any yet >>>>>>>>>>>>>>>> myself. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" >>>>>>>>>>>>>>>> <path-to-mount> >>>>>>>>>>>>>>>> > Doing these steps will ensure that heal happens only from >>>>>>>>>>>>>>>> updated brick to >>>>>>>>>>>>>>>> > > down brick. >>>>>>>>>>>>>>>> > > 5) gluster v start <> force >>>>>>>>>>>>>>>> > > 6) gluster v heal <> >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> > Will it matter if somewhere in gluster the full heal >>>>>>>>>>>>>>>> command was run other >>>>>>>>>>>>>>>> > day? Not sure if it eventually stops or times out. >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> full heal will stop once the crawl is done. So if you want >>>>>>>>>>>>>>>> to trigger heal again, >>>>>>>>>>>>>>>> run gluster v heal <>. Actually even brick up or volume >>>>>>>>>>>>>>>> start force should >>>>>>>>>>>>>>>> trigger the heal. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Did this on test bed today. its one server with 3 bricks on >>>>>>>>>>>>>>> same machine so take that for what its worth. also it still >>>>>>>>>>>>>>> runs 3.8.2. >>>>>>>>>>>>>>> Maybe ill update and re-run test. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> killed brick >>>>>>>>>>>>>>> deleted brick dir >>>>>>>>>>>>>>> recreated brick dir >>>>>>>>>>>>>>> created fake dir on gluster mount >>>>>>>>>>>>>>> set suggested fake attribute on it >>>>>>>>>>>>>>> ran volume start <> force >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> looked at files it said needed healing and it was just 8 >>>>>>>>>>>>>>> shards that were modified for few minutes I ran through steps >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> gave it few minutes and it stayed same >>>>>>>>>>>>>>> ran gluster volume <> heal >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> it healed all the directories and files you can see over >>>>>>>>>>>>>>> mount including fakedir. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> same issue for shards though. it adds more shards to heal >>>>>>>>>>>>>>> at glacier pace. slight jump in speed if I stat every file and >>>>>>>>>>>>>>> dir in VM >>>>>>>>>>>>>>> running but not all shards. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It started with 8 shards to heal and is now only at 33 out >>>>>>>>>>>>>>> of 800 and probably wont finish adding for few days at rate it >>>>>>>>>>>>>>> goes. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > > 1st node worked as expected took 12 hours to heal 1TB >>>>>>>>>>>>>>>> data. Load was >>>>>>>>>>>>>>>> > > little >>>>>>>>>>>>>>>> > > > heavy but nothing shocking. >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > About an hour after node 1 finished I began same >>>>>>>>>>>>>>>> process on node2. Heal >>>>>>>>>>>>>>>> > > > proces kicked in as before and the files in >>>>>>>>>>>>>>>> directories visible from >>>>>>>>>>>>>>>> > > mount >>>>>>>>>>>>>>>> > > > and .glusterfs healed in short time. Then it began >>>>>>>>>>>>>>>> crawl of .shard adding >>>>>>>>>>>>>>>> > > > those files to heal count at which point the entire >>>>>>>>>>>>>>>> proces ground to a >>>>>>>>>>>>>>>> > > halt >>>>>>>>>>>>>>>> > > > basically. After 48 hours out of 19k shards it has >>>>>>>>>>>>>>>> added 5900 to heal >>>>>>>>>>>>>>>> > > list. >>>>>>>>>>>>>>>> > > > Load on all 3 machnes is negligible. It was suggested >>>>>>>>>>>>>>>> to change this >>>>>>>>>>>>>>>> > > value >>>>>>>>>>>>>>>> > > > to full cluster.data-self-heal-algorithm and restart >>>>>>>>>>>>>>>> volume which I >>>>>>>>>>>>>>>> > > did. No >>>>>>>>>>>>>>>> > > > efffect. Tried relaunching heal no effect, despite >>>>>>>>>>>>>>>> any node picked. I >>>>>>>>>>>>>>>> > > > started each VM and performed a stat of all files >>>>>>>>>>>>>>>> from within it, or a >>>>>>>>>>>>>>>> > > full >>>>>>>>>>>>>>>> > > > virus scan and that seemed to cause short small >>>>>>>>>>>>>>>> spikes in shards added, >>>>>>>>>>>>>>>> > > but >>>>>>>>>>>>>>>> > > > not by much. Logs are showing no real messages >>>>>>>>>>>>>>>> indicating anything is >>>>>>>>>>>>>>>> > > going >>>>>>>>>>>>>>>> > > > on. I get hits to brick log on occasion of null >>>>>>>>>>>>>>>> lookups making me think >>>>>>>>>>>>>>>> > > its >>>>>>>>>>>>>>>> > > > not really crawling shards directory but waiting for >>>>>>>>>>>>>>>> a shard lookup to >>>>>>>>>>>>>>>> > > add >>>>>>>>>>>>>>>> > > > it. I'll get following in brick log but not constant >>>>>>>>>>>>>>>> and sometime >>>>>>>>>>>>>>>> > > multiple >>>>>>>>>>>>>>>> > > > for same shard. >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009] >>>>>>>>>>>>>>>> > > > [server-resolve.c:569:server_resolve] >>>>>>>>>>>>>>>> 0-GLUSTER1-server: no resolution >>>>>>>>>>>>>>>> > > type >>>>>>>>>>>>>>>> > > > for (null) (LOOKUP) >>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050] >>>>>>>>>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] >>>>>>>>>>>>>>>> 0-GLUSTER1-server: 12591783: >>>>>>>>>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00 >>>>>>>>>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) >>>>>>>>>>>>>>>> ==> (Invalid >>>>>>>>>>>>>>>> > > > argument) [Invalid argument] >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > This one repeated about 30 times in row then nothing >>>>>>>>>>>>>>>> for 10 minutes then >>>>>>>>>>>>>>>> > > one >>>>>>>>>>>>>>>> > > > hit for one different shard by itself. >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > How can I determine if Heal is actually running? How >>>>>>>>>>>>>>>> can I kill it or >>>>>>>>>>>>>>>> > > force >>>>>>>>>>>>>>>> > > > restart? Does node I start it from determine which >>>>>>>>>>>>>>>> directory gets >>>>>>>>>>>>>>>> > > crawled to >>>>>>>>>>>>>>>> > > > determine heals? >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > David Gossage >>>>>>>>>>>>>>>> > > > Carousel Checks Inc. | System Administrator >>>>>>>>>>>>>>>> > > > Office 708.613.2284 >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>>>> > > > [email protected] >>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>>>> > > > [email protected] >>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman/listinfo/gluster-users >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > > -- >>>>>>>>>>>>>>>> > > Thanks, >>>>>>>>>>>>>>>> > > Anuradha. >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Anuradha. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
