On Tue, Sep 6, 2016 at 7:27 PM, David Gossage <dgoss...@carouselchecks.com> wrote:
> Going to top post with solution Krutika Dhananjay came up with. His steps > were much less volatile and could be done with volume still being actively > used and also much less prone to accidental destruction. > > My use case and issue were desire to wipe a brick and recreate with same > directory structure so as to change underlying raid setup of disks making > brick. Problem occurred that getting the shards to heal was 99% of the > time failing. > > Hi, Thank you for posting this before I could get around to it. Also thanks to Pranith for suggesting the additional precautionary 'trusted.afr.dirty' step (step 4 below) and reviewing the steps once. IIUC the newly-introduced reset-brick command serves as an alternative to all this lengthy process listed below. @Pranith, Is the above statement correct? If so, do we know which releases will have the reset-brick command/feature? > These are steps he provided that has been working well. > Err.. she. :) -Krutika > > 1) kill brick pid on server that you want to replace > kill -15 <brickpid> > > 2) do brick maintenance which in my case was: > zpool destroy <ZFSPOOL> > zpool create (options) yada yada disks > > 3) make sure original path to brick exists > mkdir /path/to/brick > > 4) set extended attribute on new brick path (not over gluster mount) > setfattr -n trusted.afr.dirty -v 0x000000000000000000000001 /path/to/brick > > 5) create a mount point to volume > mkdir /mnt-brick-test > glusterfs --volfile-id=<VOLNAME> --volfile-server=<valid host or ip of an > active gluster server> --client-pid=-6 /mnt-brick-test > > 6)set an extended attribute on the gluster network mount VOLNAME is the > gluster volume KILLEDBRICK# is the index of server needing heal. they > start from 0 and gluster v info should display them in order > setfattr -n trusted.replace-brick -v VOLNAME-client-KILLEDBRICK# > /mnt-brick-test > > 7) gluster heal should know show the / root of gluster volume in output > gluster v heal VOLNAME info > > 8) force start volume to bring up killed brick > gluster v start VOLNAME force > > 9) optionally watch heal progress and drink beer while you wait and hope > nothing blows up > watch -n 10 gluster v heal VOLNAME statistics heal-count > > 10) unmount gluster network mount from server > umount /mnt-brick-test > > 11) Praise the developers for their efforts > > *David Gossage* > *Carousel Checks Inc. | System Administrator* > *Office* 708.613.2284 > > On Thu, Sep 1, 2016 at 2:29 PM, David Gossage <dgoss...@carouselchecks.com > > wrote: > >> On Thu, Sep 1, 2016 at 12:09 AM, Krutika Dhananjay <kdhan...@redhat.com> >> wrote: >> >>> >>> >>> On Wed, Aug 31, 2016 at 8:13 PM, David Gossage < >>> dgoss...@carouselchecks.com> wrote: >>> >>>> Just as a test I did not shut down the one VM on the cluster as finding >>>> a window before weekend where I can shut down all VM's and fit in a full >>>> heal is unlikely so wanted to see what occurs. >>>> >>>> >>>> kill -15 brick pid >>>> rm -Rf /gluster2/brick1/1 >>>> mkdir /gluster2/brick1/1 >>>> mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard >>>> /fake3 >>>> setfattr -n "user.some-name" -v "some-value" >>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard >>>> >>>> getfattr -d -m . -e hex /gluster2/brick2/1 >>>> # file: gluster2/brick2/1 >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000001 >>>> trusted.afr.glustershard-client-0=0x000000000000000200000000 >>>> >>> >>> This is unusual. The last digit ought to have been 1 on account of >>> "fake3" being created while hte first brick is offline. >>> >>> This discussion is becoming unnecessary lengthy. Mind if we discuss this >>> and sort it out on IRC today, at least the communication will be continuous >>> and in real-time. I'm kdhananjay on #gluster (Freenode). Ping me when >>> you're online. >>> >>> -Krutika >>> >> >> Thanks for assistance this morning. Looks like I lost connection in IRC >> and didn't realize it so sorry if you came back looking for me. Let me >> know when the steps you worked out have been reviewed and if it's found >> safe for production use and I'll give a try. >> >> >> >>> >>> >>>> trusted.afr.glustershard-client-2=0x000000000000000000000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> getfattr -d -m . -e hex /gluster2/brick3/1 >>>> # file: gluster2/brick3/1 >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000001 >>>> trusted.afr.glustershard-client-0=0x000000000000000200000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> setfattr -n trusted.afr.glustershard-client-0 -v >>>> 0x000000010000000200000000 /gluster2/brick2/1 >>>> setfattr -n trusted.afr.glustershard-client-0 -v >>>> 0x000000010000000200000000 /gluster2/brick3/1 >>>> >>>> getfattr -d -m . -e hex /gluster2/brick3/1/ >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: gluster2/brick3/1/ >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.glustershard-client-0=0x000000010000000200000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> getfattr -d -m . -e hex /gluster2/brick2/1/ >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: gluster2/brick2/1/ >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.glustershard-client-0=0x000000010000000200000000 >>>> trusted.afr.glustershard-client-2=0x000000000000000000000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> gluster v start glustershard force >>>> >>>> gluster heal counts climbed up and down a little as it healed >>>> everything in visible gluster mount and .glusterfs for visible mount files >>>> then stalled with around 15 shards and the fake3 directory still in list >>>> >>>> getfattr -d -m . -e hex /gluster2/brick2/1/ >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: gluster2/brick2/1/ >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.glustershard-client-0=0x000000010000000000000000 >>>> trusted.afr.glustershard-client-2=0x000000000000000000000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> getfattr -d -m . -e hex /gluster2/brick3/1/ >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: gluster2/brick3/1/ >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.glustershard-client-0=0x000000010000000000000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> getfattr -d -m . -e hex /gluster2/brick1/1/ >>>> getfattr: Removing leading '/' from absolute path names >>>> # file: gluster2/brick1/1/ >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> heal count stayed same for awhile then ran >>>> >>>> gluster v heal glustershard full >>>> >>>> heals jump up to 700 as shards actually get read in as needing heals. >>>> glustershd shows 3 sweeps started one per brick >>>> >>>> It heals shards things look ok heal <> info shows 0 files but >>>> statistics heal-info shows 1 left for brick 2 and 3. perhaps cause I didnt >>>> stop vm running? >>>> >>>> # file: gluster2/brick1/1/ >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> # file: gluster2/brick2/1/ >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.glustershard-client-0=0x000000010000000000000000 >>>> trusted.afr.glustershard-client-2=0x000000000000000000000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> # file: gluster2/brick3/1/ >>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>> 23a756e6c6162656c65645f743a733000 >>>> trusted.afr.dirty=0x000000000000000000000000 >>>> trusted.afr.glustershard-client-0=0x000000010000000000000000 >>>> trusted.gfid=0x00000000000000000000000000000001 >>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>> user.some-name=0x736f6d652d76616c7565 >>>> >>>> meta-data split-brain? heal <> info split-brain shows no files or >>>> entries. If I had thought ahead I would have checked the values returned >>>> by getfattr before, although I do know heal-count was returning 0 at the >>>> time >>>> >>>> >>>> Assuming I need to shut down vm's and put volume in maintenance from >>>> ovirt to prevent any io. Does it need to occur for whole heal or can I >>>> re-activate at some point to bring VM's back up? >>>> >>>> >>>> >>>> >>>> *David Gossage* >>>> *Carousel Checks Inc. | System Administrator* >>>> *Office* 708.613.2284 >>>> >>>> On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <kdhan...@redhat.com >>>> > wrote: >>>> >>>>> No, sorry, it's working fine. I may have missed some step because of >>>>> which i saw that problem. /.shard is also healing fine now. >>>>> >>>>> Let me know if it works for you. >>>>> >>>>> -Krutika >>>>> >>>>> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay < >>>>> kdhan...@redhat.com> wrote: >>>>> >>>>>> OK I just hit the other issue too, where .shard doesn't get healed. :) >>>>>> >>>>>> Investigating as to why that is the case. Give me some time. >>>>>> >>>>>> -Krutika >>>>>> >>>>>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay < >>>>>> kdhan...@redhat.com> wrote: >>>>>> >>>>>>> Just figured the steps Anuradha has provided won't work if granular >>>>>>> entry heal is on. >>>>>>> So when you bring down a brick and create fake2 under / of the >>>>>>> volume, granular entry heal feature causes >>>>>>> sh to remember only the fact that 'fake2' needs to be recreated on >>>>>>> the offline brick (because changelogs are granular). >>>>>>> >>>>>>> In this case, we would be required to indicate to self-heal-daemon >>>>>>> that the entire directory tree from '/' needs to be repaired on the >>>>>>> brick >>>>>>> that contains no data. >>>>>>> >>>>>>> To fix this, I did the following (for users who use granular entry >>>>>>> self-healing): >>>>>>> >>>>>>> 1. Kill the last brick process in the replica (/bricks/3) >>>>>>> >>>>>>> 2. [root@server-3 ~]# rm -rf /bricks/3 >>>>>>> >>>>>>> 3. [root@server-3 ~]# mkdir /bricks/3 >>>>>>> >>>>>>> 4. Create a new dir on the mount point: >>>>>>> [root@client-1 ~]# mkdir /mnt/fake >>>>>>> >>>>>>> 5. Set some fake xattr on the root of the volume, and not the 'fake' >>>>>>> directory itself. >>>>>>> [root@client-1 ~]# setfattr -n "user.some-name" -v "some-value" >>>>>>> /mnt >>>>>>> >>>>>>> 6. Make sure there's no io happening on your volume. >>>>>>> >>>>>>> 7. Check the pending xattrs on the brick directories of the two good >>>>>>> copies (on bricks 1 and 2), you should be seeing same values as the one >>>>>>> marked in red in both bricks. >>>>>>> (note that the client-<num> xattr key will have the same last digit >>>>>>> as the index of the brick that is down, when counting from 0. So if the >>>>>>> first brick is the one that is down, it would read >>>>>>> trusted.afr.*-client-0; >>>>>>> if the second brick is the one that is empty and down, it would read >>>>>>> trusted.afr.*-client-1 and so on). >>>>>>> >>>>>>> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 >>>>>>> # file: 1 >>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>>>> 23a6574635f72756e74696d655f743a733000 >>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>> *trusted.afr.rep-client-2=0x000000000000000100000001* >>>>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>>>>>> >>>>>>> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 >>>>>>> # file: 2 >>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>>>> 23a6574635f72756e74696d655f743a733000 >>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>> *trusted.afr.rep-client-2=0x000**000000000000100000001* >>>>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>>>>>> >>>>>>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a 1. >>>>>>> >>>>>>> [root@server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v >>>>>>> *0x000000010000000100000001* /bricks/1 >>>>>>> [root@server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v >>>>>>> *0x000000010000000100000001* /bricks/2 >>>>>>> >>>>>>> 9. Get the xattrs again and check the xattrs are set properly now >>>>>>> >>>>>>> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 >>>>>>> # file: 1 >>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>>>> 23a6574635f72756e74696d655f743a733000 >>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001* >>>>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>>>>>> >>>>>>> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 >>>>>>> # file: 2 >>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>>>> 23a6574635f72756e74696d655f743a733000 >>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001* >>>>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>>>>>> >>>>>>> 10. Force-start the volume. >>>>>>> >>>>>>> [root@server-1 ~]# gluster volume start rep force >>>>>>> volume start: rep: success >>>>>>> >>>>>>> 11. Monitor heal-info command to ensure the number of entries keeps >>>>>>> growing. >>>>>>> >>>>>>> 12. Keep monitoring with step 10 and eventually the number of >>>>>>> entries needing heal must come down to 0. >>>>>>> Also the checksums of the files on the previously empty brick should >>>>>>> now match with the copies on the other two bricks. >>>>>>> >>>>>>> Could you check if the above steps work for you, in your test >>>>>>> environment? >>>>>>> >>>>>>> You caught a nice bug in the manual steps to follow when granular >>>>>>> entry-heal is enabled and an empty brick needs heal. Thanks for >>>>>>> reporting >>>>>>> it. :) We will fix the documentation appropriately. >>>>>>> >>>>>>> -Krutika >>>>>>> >>>>>>> >>>>>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay < >>>>>>> kdhan...@redhat.com> wrote: >>>>>>> >>>>>>>> Tried this. >>>>>>>> >>>>>>>> With me, only 'fake2' gets healed after i bring the 'empty' brick >>>>>>>> back up and it stops there unless I do a 'heal-full'. >>>>>>>> >>>>>>>> Is that what you're seeing as well? >>>>>>>> >>>>>>>> -Krutika >>>>>>>> >>>>>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage < >>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>> >>>>>>>>> Same issue brought up glusterd on problem node heal count still >>>>>>>>> stuck at 6330. >>>>>>>>> >>>>>>>>> Ran gluster v heal GUSTER1 full >>>>>>>>> >>>>>>>>> glustershd on problem node shows a sweep starting and finishing in >>>>>>>>> seconds. Other 2 nodes show no activity in log. They should start a >>>>>>>>> sweep >>>>>>>>> too shouldn't they? >>>>>>>>> >>>>>>>>> Tried starting from scratch >>>>>>>>> >>>>>>>>> kill -15 brickpid >>>>>>>>> rm -Rf /brick >>>>>>>>> mkdir -p /brick >>>>>>>>> mkdir mkdir /gsmount/fake2 >>>>>>>>> setfattr -n "user.some-name" -v "some-value" /gsmount/fake2 >>>>>>>>> >>>>>>>>> Heals visible dirs instantly then stops. >>>>>>>>> >>>>>>>>> gluster v heal GLUSTER1 full >>>>>>>>> >>>>>>>>> see sweep star on problem node and end almost instantly. no files >>>>>>>>> added t heal list no files healed no more logging >>>>>>>>> >>>>>>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026] >>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>> GLUSTER1-client-1 >>>>>>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026] >>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>> GLUSTER1-client-1 >>>>>>>>> >>>>>>>>> same results no matter which node you run command on. Still stuck >>>>>>>>> with 6330 files showing needing healed out of 19k. still showing in >>>>>>>>> logs >>>>>>>>> no heals are occuring. >>>>>>>>> >>>>>>>>> Is their a way to forcibly reset any prior heal data? Could it be >>>>>>>>> stuck on some past failed heal start? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> *David Gossage* >>>>>>>>> *Carousel Checks Inc. | System Administrator* >>>>>>>>> *Office* 708.613.2284 >>>>>>>>> >>>>>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage < >>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>> >>>>>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage < >>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>> >>>>>>>>>>> updated test server to 3.8.3 >>>>>>>>>>> >>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>>>>>>> Options Reconfigured: >>>>>>>>>>> cluster.granular-entry-heal: on >>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>> nfs.disable: on >>>>>>>>>>> nfs.addr-namelookup: off >>>>>>>>>>> nfs.enable-ino32: off >>>>>>>>>>> cluster.background-self-heal-count: 16 >>>>>>>>>>> cluster.self-heal-window-size: 1024 >>>>>>>>>>> performance.quick-read: off >>>>>>>>>>> performance.io-cache: off >>>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>>> cluster.eager-lock: enable >>>>>>>>>>> network.remote-dio: on >>>>>>>>>>> cluster.quorum-type: auto >>>>>>>>>>> cluster.server-quorum-type: server >>>>>>>>>>> storage.owner-gid: 36 >>>>>>>>>>> storage.owner-uid: 36 >>>>>>>>>>> server.allow-insecure: on >>>>>>>>>>> features.shard: on >>>>>>>>>>> features.shard-block-size: 64MB >>>>>>>>>>> performance.strict-o-direct: off >>>>>>>>>>> cluster.locking-scheme: granular >>>>>>>>>>> >>>>>>>>>>> kill -15 brickpid >>>>>>>>>>> rm -Rf /gluster2/brick3 >>>>>>>>>>> mkdir -p /gluster2/brick3/1 >>>>>>>>>>> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10 >>>>>>>>>>> \:_glustershard/fake2 >>>>>>>>>>> setfattr -n "user.some-name" -v "some-value" >>>>>>>>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard >>>>>>>>>>> /fake2 >>>>>>>>>>> gluster v start glustershard force >>>>>>>>>>> >>>>>>>>>>> at this point brick process starts and all visible files >>>>>>>>>>> including new dir are made on brick >>>>>>>>>>> handful of shards are in heal statistics still but no .shard >>>>>>>>>>> directory created and no increase in shard count >>>>>>>>>>> >>>>>>>>>>> gluster v heal glustershard >>>>>>>>>>> >>>>>>>>>>> At this point still no increase in count or dir made no >>>>>>>>>>> additional activity in logs for healing generated. waited few >>>>>>>>>>> minutes >>>>>>>>>>> tailing logs to check if anything kicked in. >>>>>>>>>>> >>>>>>>>>>> gluster v heal glustershard full >>>>>>>>>>> >>>>>>>>>>> gluster shards added to list and heal commences. logs show full >>>>>>>>>>> sweep starting on all 3 nodes. though this time it only shows as >>>>>>>>>>> finishing >>>>>>>>>>> on one which looks to be the one that had brick deleted. >>>>>>>>>>> >>>>>>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026] >>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>> glustershard-client-0 >>>>>>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026] >>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>> glustershard-client-1 >>>>>>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026] >>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>> glustershard-client-2 >>>>>>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026] >>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>>>>> glustershard-client-2 >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Just realized its still healing so that may be why sweep on 2 >>>>>>>>>> other bricks haven't replied as finished. >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> my hope is that later tonight a full heal will work on >>>>>>>>>>> production. Is it possible self-heal daemon can get stale or stop >>>>>>>>>>> listening but still show as active? Would stopping and starting >>>>>>>>>>> self-heal >>>>>>>>>>> daemon from gluster cli before doing these heals be helpful? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage < >>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage < >>>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay < >>>>>>>>>>>>> kdhan...@redhat.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay < >>>>>>>>>>>>>> kdhan...@redhat.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage < >>>>>>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay < >>>>>>>>>>>>>>>> kdhan...@redhat.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Could you also share the glustershd logs? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'll get them when I get to work sure >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I tried the same steps that you mentioned multiple times, >>>>>>>>>>>>>>>>> but heal is running to completion without any issues. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It must be said that 'heal full' traverses the files and >>>>>>>>>>>>>>>>> directories in a depth-first order and does heals also in the >>>>>>>>>>>>>>>>> same order. >>>>>>>>>>>>>>>>> But if it gets interrupted in the middle (say because >>>>>>>>>>>>>>>>> self-heal-daemon was >>>>>>>>>>>>>>>>> either intentionally or unintentionally brought offline and >>>>>>>>>>>>>>>>> then brought >>>>>>>>>>>>>>>>> back up), self-heal will only pick up the entries that are so >>>>>>>>>>>>>>>>> far marked as >>>>>>>>>>>>>>>>> new-entries that need heal which it will find in >>>>>>>>>>>>>>>>> indices/xattrop directory. >>>>>>>>>>>>>>>>> What this means is that those files and directories that were >>>>>>>>>>>>>>>>> not visited >>>>>>>>>>>>>>>>> during the crawl, will remain untouched and unhealed in this >>>>>>>>>>>>>>>>> second >>>>>>>>>>>>>>>>> iteration of heal, unless you execute a 'heal-full' again. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So should it start healing shards as it crawls or not until >>>>>>>>>>>>>>>> after it crawls the entire .shard directory? At the pace it >>>>>>>>>>>>>>>> was going that >>>>>>>>>>>>>>>> could be a week with one node appearing in the cluster but >>>>>>>>>>>>>>>> with no shard >>>>>>>>>>>>>>>> files if anything tries to access a file on that node. From >>>>>>>>>>>>>>>> my experience >>>>>>>>>>>>>>>> other day telling it to heal full again did nothing regardless >>>>>>>>>>>>>>>> of node used. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> Crawl is started from '/' of the volume. Whenever self-heal >>>>>>>>>>>>>> detects during the crawl that a file or directory is present in >>>>>>>>>>>>>> some >>>>>>>>>>>>>> brick(s) and absent in others, it creates the file on the bricks >>>>>>>>>>>>>> where it >>>>>>>>>>>>>> is absent and marks the fact that the file or directory might >>>>>>>>>>>>>> need >>>>>>>>>>>>>> data/entry and metadata heal too (this also means that an index >>>>>>>>>>>>>> is created >>>>>>>>>>>>>> under .glusterfs/indices/xattrop of the src bricks). And the >>>>>>>>>>>>>> data/entry and >>>>>>>>>>>>>> metadata heal are picked up and done in >>>>>>>>>>>>>> >>>>>>>>>>>>> the background with the help of these indices. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Looking at my 3rd node as example i find nearly an exact same >>>>>>>>>>>>> number of files in xattrop dir as reported by heal count at time >>>>>>>>>>>>> I brought >>>>>>>>>>>>> down node2 to try and alleviate read io errors that seemed to >>>>>>>>>>>>> occur from >>>>>>>>>>>>> what I was guessing as attempts to use the node with no shards >>>>>>>>>>>>> for reads. >>>>>>>>>>>>> >>>>>>>>>>>>> Also attached are the glustershd logs from the 3 nodes, along >>>>>>>>>>>>> with the test node i tried yesterday with same results. >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Looking at my own logs I notice that a full sweep was only ever >>>>>>>>>>>> recorded in glustershd.log on 2nd node with missing directory. I >>>>>>>>>>>> believe I >>>>>>>>>>>> should have found a sweep begun on every node correct? >>>>>>>>>>>> >>>>>>>>>>>> On my test dev when it did work I do see that >>>>>>>>>>>> >>>>>>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>> glustershard-client-0 >>>>>>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>> glustershard-client-1 >>>>>>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>> glustershard-client-2 >>>>>>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>>>>>> glustershard-client-2 >>>>>>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>>>>>> glustershard-client-1 >>>>>>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>>>>>> glustershard-client-0 >>>>>>>>>>>> >>>>>>>>>>>> While when looking at past few days of the 3 prod nodes i only >>>>>>>>>>>> found that on my 2nd node >>>>>>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> My suspicion is that this is what happened on your setup. >>>>>>>>>>>>>>>>> Could you confirm if that was the case? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Brick was brought online with force start then a full heal >>>>>>>>>>>>>>>> launched. Hours later after it became evident that it was not >>>>>>>>>>>>>>>> adding new >>>>>>>>>>>>>>>> files to heal I did try restarting self-heal daemon and >>>>>>>>>>>>>>>> relaunching full >>>>>>>>>>>>>>>> heal again. But this was after the heal had basically already >>>>>>>>>>>>>>>> failed to >>>>>>>>>>>>>>>> work as intended. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> OK. How did you figure it was not adding any new files? I >>>>>>>>>>>>>>> need to know what places you were monitoring to come to this >>>>>>>>>>>>>>> conclusion. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -Krutika >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As for those logs, I did manager to do something that >>>>>>>>>>>>>>>>> caused these warning messages you shared earlier to appear in >>>>>>>>>>>>>>>>> my client and >>>>>>>>>>>>>>>>> server logs. >>>>>>>>>>>>>>>>> Although these logs are annoying and a bit scary too, they >>>>>>>>>>>>>>>>> didn't do any harm to the data in my volume. Why they appear >>>>>>>>>>>>>>>>> just after a >>>>>>>>>>>>>>>>> brick is replaced and under no other circumstances is >>>>>>>>>>>>>>>>> something I'm still >>>>>>>>>>>>>>>>> investigating. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> But for future, it would be good to follow the steps >>>>>>>>>>>>>>>>> Anuradha gave as that would allow self-heal to at least >>>>>>>>>>>>>>>>> detect that it has >>>>>>>>>>>>>>>>> some repairing to do whenever it is restarted whether >>>>>>>>>>>>>>>>> intentionally or >>>>>>>>>>>>>>>>> otherwise. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I followed those steps as described on my test box and >>>>>>>>>>>>>>>> ended up with exact same outcome of adding shards at an >>>>>>>>>>>>>>>> agonizing slow pace >>>>>>>>>>>>>>>> and no creation of .shard directory or heals on shard >>>>>>>>>>>>>>>> directory. >>>>>>>>>>>>>>>> Directories visible from mount healed quickly. This was with >>>>>>>>>>>>>>>> one VM so it >>>>>>>>>>>>>>>> has only 800 shards as well. After hours at work it had added >>>>>>>>>>>>>>>> a total of >>>>>>>>>>>>>>>> 33 shards to be healed. I sent those logs yesterday as well >>>>>>>>>>>>>>>> though not the >>>>>>>>>>>>>>>> glustershd. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does replace-brick command copy files in same manner? For >>>>>>>>>>>>>>>> these purposes I am contemplating just skipping the heal route. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Krutika >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage < >>>>>>>>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> attached brick and client logs from test machine where >>>>>>>>>>>>>>>>>> same behavior occurred not sure if anything new is there. >>>>>>>>>>>>>>>>>> its still on >>>>>>>>>>>>>>>>>> 3.8.2 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>>>>>>> Bricks: >>>>>>>>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>>>>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>>>>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>>>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>>>>>>> cluster.locking-scheme: granular >>>>>>>>>>>>>>>>>> performance.strict-o-direct: off >>>>>>>>>>>>>>>>>> features.shard-block-size: 64MB >>>>>>>>>>>>>>>>>> features.shard: on >>>>>>>>>>>>>>>>>> server.allow-insecure: on >>>>>>>>>>>>>>>>>> storage.owner-uid: 36 >>>>>>>>>>>>>>>>>> storage.owner-gid: 36 >>>>>>>>>>>>>>>>>> cluster.server-quorum-type: server >>>>>>>>>>>>>>>>>> cluster.quorum-type: auto >>>>>>>>>>>>>>>>>> network.remote-dio: on >>>>>>>>>>>>>>>>>> cluster.eager-lock: enable >>>>>>>>>>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>>>>>>>>>> performance.io-cache: off >>>>>>>>>>>>>>>>>> performance.quick-read: off >>>>>>>>>>>>>>>>>> cluster.self-heal-window-size: 1024 >>>>>>>>>>>>>>>>>> cluster.background-self-heal-count: 16 >>>>>>>>>>>>>>>>>> nfs.enable-ino32: off >>>>>>>>>>>>>>>>>> nfs.addr-namelookup: off >>>>>>>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>>>>>>>>> cluster.granular-entry-heal: on >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage < >>>>>>>>>>>>>>>>>> dgoss...@carouselchecks.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur < >>>>>>>>>>>>>>>>>>> ata...@redhat.com> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>>>>>>>> > From: "David Gossage" <dgoss...@carouselchecks.com> >>>>>>>>>>>>>>>>>>>> > To: "Anuradha Talur" <ata...@redhat.com> >>>>>>>>>>>>>>>>>>>> > Cc: "gluster-users@gluster.org List" < >>>>>>>>>>>>>>>>>>>> Gluster-users@gluster.org>, "Krutika Dhananjay" < >>>>>>>>>>>>>>>>>>>> kdhan...@redhat.com> >>>>>>>>>>>>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM >>>>>>>>>>>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing >>>>>>>>>>>>>>>>>>>> Glacier Slow >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur < >>>>>>>>>>>>>>>>>>>> ata...@redhat.com> wrote: >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > > Response inline. >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > ----- Original Message ----- >>>>>>>>>>>>>>>>>>>> > > > From: "Krutika Dhananjay" <kdhan...@redhat.com> >>>>>>>>>>>>>>>>>>>> > > > To: "David Gossage" <dgoss...@carouselchecks.com> >>>>>>>>>>>>>>>>>>>> > > > Cc: "gluster-users@gluster.org List" < >>>>>>>>>>>>>>>>>>>> Gluster-users@gluster.org> >>>>>>>>>>>>>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM >>>>>>>>>>>>>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards Healing >>>>>>>>>>>>>>>>>>>> Glacier Slow >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > Could you attach both client and brick logs? >>>>>>>>>>>>>>>>>>>> Meanwhile I will try these >>>>>>>>>>>>>>>>>>>> > > steps >>>>>>>>>>>>>>>>>>>> > > > out on my machines and see if it is easily >>>>>>>>>>>>>>>>>>>> recreatable. >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > -Krutika >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage < >>>>>>>>>>>>>>>>>>>> > > dgoss...@carouselchecks.com >>>>>>>>>>>>>>>>>>>> > > > > wrote: >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > Centos 7 Gluster 3.8.3 >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>>>>>> > > > Options Reconfigured: >>>>>>>>>>>>>>>>>>>> > > > cluster.data-self-heal-algorithm: full >>>>>>>>>>>>>>>>>>>> > > > cluster.self-heal-daemon: on >>>>>>>>>>>>>>>>>>>> > > > cluster.locking-scheme: granular >>>>>>>>>>>>>>>>>>>> > > > features.shard-block-size: 64MB >>>>>>>>>>>>>>>>>>>> > > > features.shard: on >>>>>>>>>>>>>>>>>>>> > > > performance.readdir-ahead: on >>>>>>>>>>>>>>>>>>>> > > > storage.owner-uid: 36 >>>>>>>>>>>>>>>>>>>> > > > storage.owner-gid: 36 >>>>>>>>>>>>>>>>>>>> > > > performance.quick-read: off >>>>>>>>>>>>>>>>>>>> > > > performance.read-ahead: off >>>>>>>>>>>>>>>>>>>> > > > performance.io-cache: off >>>>>>>>>>>>>>>>>>>> > > > performance.stat-prefetch: on >>>>>>>>>>>>>>>>>>>> > > > cluster.eager-lock: enable >>>>>>>>>>>>>>>>>>>> > > > network.remote-dio: enable >>>>>>>>>>>>>>>>>>>> > > > cluster.quorum-type: auto >>>>>>>>>>>>>>>>>>>> > > > cluster.server-quorum-type: server >>>>>>>>>>>>>>>>>>>> > > > server.allow-insecure: on >>>>>>>>>>>>>>>>>>>> > > > cluster.self-heal-window-size: 1024 >>>>>>>>>>>>>>>>>>>> > > > cluster.background-self-heal-count: 16 >>>>>>>>>>>>>>>>>>>> > > > performance.strict-write-ordering: off >>>>>>>>>>>>>>>>>>>> > > > nfs.disable: on >>>>>>>>>>>>>>>>>>>> > > > nfs.addr-namelookup: off >>>>>>>>>>>>>>>>>>>> > > > nfs.enable-ino32: off >>>>>>>>>>>>>>>>>>>> > > > cluster.granular-entry-heal: on >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no >>>>>>>>>>>>>>>>>>>> issues. >>>>>>>>>>>>>>>>>>>> > > > Following steps detailed in previous >>>>>>>>>>>>>>>>>>>> recommendations began proces of >>>>>>>>>>>>>>>>>>>> > > > replacing and healngbricks one node at a time. >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > 1) kill pid of brick >>>>>>>>>>>>>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10 >>>>>>>>>>>>>>>>>>>> > > > 3) recreate directory of brick >>>>>>>>>>>>>>>>>>>> > > > 4) gluster volume start <> force >>>>>>>>>>>>>>>>>>>> > > > 5) gluster volume heal <> full >>>>>>>>>>>>>>>>>>>> > > Hi, >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > I'd suggest that full heal is not used. There are a >>>>>>>>>>>>>>>>>>>> few bugs in full heal. >>>>>>>>>>>>>>>>>>>> > > Better safe than sorry ;) >>>>>>>>>>>>>>>>>>>> > > Instead I'd suggest the following steps: >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > Currently I brought the node down by systemctl stop >>>>>>>>>>>>>>>>>>>> glusterd as I was >>>>>>>>>>>>>>>>>>>> > getting sporadic io issues and a few VM's paused so >>>>>>>>>>>>>>>>>>>> hoping that will help. >>>>>>>>>>>>>>>>>>>> > I may wait to do this till around 4PM when most work >>>>>>>>>>>>>>>>>>>> is done in case it >>>>>>>>>>>>>>>>>>>> > shoots load up. >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > > 1) kill pid of brick >>>>>>>>>>>>>>>>>>>> > > 2) to configuring of brick that you need >>>>>>>>>>>>>>>>>>>> > > 3) recreate brick dir >>>>>>>>>>>>>>>>>>>> > > 4) while the brick is still down, from the mount >>>>>>>>>>>>>>>>>>>> point: >>>>>>>>>>>>>>>>>>>> > > a) create a dummy non existent dir under / of >>>>>>>>>>>>>>>>>>>> mount. >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > so if noee 2 is down brick, pick node for example 3 >>>>>>>>>>>>>>>>>>>> and make a test dir >>>>>>>>>>>>>>>>>>>> > under its brick directory that doesnt exist on 2 or >>>>>>>>>>>>>>>>>>>> should I be dong this >>>>>>>>>>>>>>>>>>>> > over a gluster mount? >>>>>>>>>>>>>>>>>>>> You should be doing this over gluster mount. >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > > b) set a non existent extended attribute on / of >>>>>>>>>>>>>>>>>>>> mount. >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > Could you give me an example of an attribute to set? >>>>>>>>>>>>>>>>>>>> I've read a tad on >>>>>>>>>>>>>>>>>>>> > this, and looked up attributes but haven't set any >>>>>>>>>>>>>>>>>>>> yet myself. >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" >>>>>>>>>>>>>>>>>>>> <path-to-mount> >>>>>>>>>>>>>>>>>>>> > Doing these steps will ensure that heal happens only >>>>>>>>>>>>>>>>>>>> from updated brick to >>>>>>>>>>>>>>>>>>>> > > down brick. >>>>>>>>>>>>>>>>>>>> > > 5) gluster v start <> force >>>>>>>>>>>>>>>>>>>> > > 6) gluster v heal <> >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> > Will it matter if somewhere in gluster the full heal >>>>>>>>>>>>>>>>>>>> command was run other >>>>>>>>>>>>>>>>>>>> > day? Not sure if it eventually stops or times out. >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> full heal will stop once the crawl is done. So if you >>>>>>>>>>>>>>>>>>>> want to trigger heal again, >>>>>>>>>>>>>>>>>>>> run gluster v heal <>. Actually even brick up or volume >>>>>>>>>>>>>>>>>>>> start force should >>>>>>>>>>>>>>>>>>>> trigger the heal. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Did this on test bed today. its one server with 3 >>>>>>>>>>>>>>>>>>> bricks on same machine so take that for what its worth. >>>>>>>>>>>>>>>>>>> also it still runs >>>>>>>>>>>>>>>>>>> 3.8.2. Maybe ill update and re-run test. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> killed brick >>>>>>>>>>>>>>>>>>> deleted brick dir >>>>>>>>>>>>>>>>>>> recreated brick dir >>>>>>>>>>>>>>>>>>> created fake dir on gluster mount >>>>>>>>>>>>>>>>>>> set suggested fake attribute on it >>>>>>>>>>>>>>>>>>> ran volume start <> force >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> looked at files it said needed healing and it was just 8 >>>>>>>>>>>>>>>>>>> shards that were modified for few minutes I ran through >>>>>>>>>>>>>>>>>>> steps >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> gave it few minutes and it stayed same >>>>>>>>>>>>>>>>>>> ran gluster volume <> heal >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> it healed all the directories and files you can see over >>>>>>>>>>>>>>>>>>> mount including fakedir. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> same issue for shards though. it adds more shards to >>>>>>>>>>>>>>>>>>> heal at glacier pace. slight jump in speed if I stat every >>>>>>>>>>>>>>>>>>> file and dir in >>>>>>>>>>>>>>>>>>> VM running but not all shards. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It started with 8 shards to heal and is now only at 33 >>>>>>>>>>>>>>>>>>> out of 800 and probably wont finish adding for few days at >>>>>>>>>>>>>>>>>>> rate it goes. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > > 1st node worked as expected took 12 hours to heal >>>>>>>>>>>>>>>>>>>> 1TB data. Load was >>>>>>>>>>>>>>>>>>>> > > little >>>>>>>>>>>>>>>>>>>> > > > heavy but nothing shocking. >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > About an hour after node 1 finished I began same >>>>>>>>>>>>>>>>>>>> process on node2. Heal >>>>>>>>>>>>>>>>>>>> > > > proces kicked in as before and the files in >>>>>>>>>>>>>>>>>>>> directories visible from >>>>>>>>>>>>>>>>>>>> > > mount >>>>>>>>>>>>>>>>>>>> > > > and .glusterfs healed in short time. Then it >>>>>>>>>>>>>>>>>>>> began crawl of .shard adding >>>>>>>>>>>>>>>>>>>> > > > those files to heal count at which point the >>>>>>>>>>>>>>>>>>>> entire proces ground to a >>>>>>>>>>>>>>>>>>>> > > halt >>>>>>>>>>>>>>>>>>>> > > > basically. After 48 hours out of 19k shards it >>>>>>>>>>>>>>>>>>>> has added 5900 to heal >>>>>>>>>>>>>>>>>>>> > > list. >>>>>>>>>>>>>>>>>>>> > > > Load on all 3 machnes is negligible. It was >>>>>>>>>>>>>>>>>>>> suggested to change this >>>>>>>>>>>>>>>>>>>> > > value >>>>>>>>>>>>>>>>>>>> > > > to full cluster.data-self-heal-algorithm and >>>>>>>>>>>>>>>>>>>> restart volume which I >>>>>>>>>>>>>>>>>>>> > > did. No >>>>>>>>>>>>>>>>>>>> > > > efffect. Tried relaunching heal no effect, >>>>>>>>>>>>>>>>>>>> despite any node picked. I >>>>>>>>>>>>>>>>>>>> > > > started each VM and performed a stat of all files >>>>>>>>>>>>>>>>>>>> from within it, or a >>>>>>>>>>>>>>>>>>>> > > full >>>>>>>>>>>>>>>>>>>> > > > virus scan and that seemed to cause short small >>>>>>>>>>>>>>>>>>>> spikes in shards added, >>>>>>>>>>>>>>>>>>>> > > but >>>>>>>>>>>>>>>>>>>> > > > not by much. Logs are showing no real messages >>>>>>>>>>>>>>>>>>>> indicating anything is >>>>>>>>>>>>>>>>>>>> > > going >>>>>>>>>>>>>>>>>>>> > > > on. I get hits to brick log on occasion of null >>>>>>>>>>>>>>>>>>>> lookups making me think >>>>>>>>>>>>>>>>>>>> > > its >>>>>>>>>>>>>>>>>>>> > > > not really crawling shards directory but waiting >>>>>>>>>>>>>>>>>>>> for a shard lookup to >>>>>>>>>>>>>>>>>>>> > > add >>>>>>>>>>>>>>>>>>>> > > > it. I'll get following in brick log but not >>>>>>>>>>>>>>>>>>>> constant and sometime >>>>>>>>>>>>>>>>>>>> > > multiple >>>>>>>>>>>>>>>>>>>> > > > for same shard. >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009] >>>>>>>>>>>>>>>>>>>> > > > [server-resolve.c:569:server_resolve] >>>>>>>>>>>>>>>>>>>> 0-GLUSTER1-server: no resolution >>>>>>>>>>>>>>>>>>>> > > type >>>>>>>>>>>>>>>>>>>> > > > for (null) (LOOKUP) >>>>>>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050] >>>>>>>>>>>>>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] >>>>>>>>>>>>>>>>>>>> 0-GLUSTER1-server: 12591783: >>>>>>>>>>>>>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00 >>>>>>>>>>>>>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) >>>>>>>>>>>>>>>>>>>> ==> (Invalid >>>>>>>>>>>>>>>>>>>> > > > argument) [Invalid argument] >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > This one repeated about 30 times in row then >>>>>>>>>>>>>>>>>>>> nothing for 10 minutes then >>>>>>>>>>>>>>>>>>>> > > one >>>>>>>>>>>>>>>>>>>> > > > hit for one different shard by itself. >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > How can I determine if Heal is actually running? >>>>>>>>>>>>>>>>>>>> How can I kill it or >>>>>>>>>>>>>>>>>>>> > > force >>>>>>>>>>>>>>>>>>>> > > > restart? Does node I start it from determine >>>>>>>>>>>>>>>>>>>> which directory gets >>>>>>>>>>>>>>>>>>>> > > crawled to >>>>>>>>>>>>>>>>>>>> > > > determine heals? >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > David Gossage >>>>>>>>>>>>>>>>>>>> > > > Carousel Checks Inc. | System Administrator >>>>>>>>>>>>>>>>>>>> > > > Office 708.613.2284 >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>>>>>>>> > > > Gluster-users@gluster.org >>>>>>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman >>>>>>>>>>>>>>>>>>>> /listinfo/gluster-users >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>>>>>>>> > > > Gluster-users@gluster.org >>>>>>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman >>>>>>>>>>>>>>>>>>>> /listinfo/gluster-users >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > > -- >>>>>>>>>>>>>>>>>>>> > > Thanks, >>>>>>>>>>>>>>>>>>>> > > Anuradha. >>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> Anuradha. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org http://www.gluster.org/mailman/listinfo/gluster-users