On Tue, Sep 6, 2016 at 11:41 AM, Krutika Dhananjay <[email protected]> wrote:
> > > On Tue, Sep 6, 2016 at 7:27 PM, David Gossage <[email protected] > > wrote: > >> Going to top post with solution Krutika Dhananjay came up with. His >> steps were much less volatile and could be done with volume still being >> actively used and also much less prone to accidental destruction. >> >> My use case and issue were desire to wipe a brick and recreate with same >> directory structure so as to change underlying raid setup of disks making >> brick. Problem occurred that getting the shards to heal was 99% of the >> time failing. >> >> > Hi, > > Thank you for posting this before I could get around to it. Also thanks to > Pranith for suggesting the additional precautionary 'trusted.afr.dirty' > step (step 4 below) and reviewing the steps once. > > IIUC the newly-introduced reset-brick command serves as an alternative to > all this lengthy process listed below. > > @Pranith, > Is the above statement correct? If so, do we know which releases will have > the reset-brick command/feature? > > > >> These are steps he provided that has been working well. >> > > Err.. she. :) > ack so sorry > > -Krutika > > >> >> 1) kill brick pid on server that you want to replace >> kill -15 <brickpid> >> >> 2) do brick maintenance which in my case was: >> zpool destroy <ZFSPOOL> >> zpool create (options) yada yada disks >> >> 3) make sure original path to brick exists >> mkdir /path/to/brick >> >> 4) set extended attribute on new brick path (not over gluster mount) >> setfattr -n trusted.afr.dirty -v 0x000000000000000000000001 /pa >> th/to/brick >> >> 5) create a mount point to volume >> mkdir /mnt-brick-test >> glusterfs --volfile-id=<VOLNAME> --volfile-server=<valid host or ip of an >> active gluster server> --client-pid=-6 /mnt-brick-test >> >> 6)set an extended attribute on the gluster network mount VOLNAME is the >> gluster volume KILLEDBRICK# is the index of server needing heal. they >> start from 0 and gluster v info should display them in order >> setfattr -n trusted.replace-brick -v VOLNAME-client-KILLEDBRICK# >> /mnt-brick-test >> >> 7) gluster heal should know show the / root of gluster volume in output >> gluster v heal VOLNAME info >> >> 8) force start volume to bring up killed brick >> gluster v start VOLNAME force >> >> 9) optionally watch heal progress and drink beer while you wait and hope >> nothing blows up >> watch -n 10 gluster v heal VOLNAME statistics heal-count >> >> 10) unmount gluster network mount from server >> umount /mnt-brick-test >> >> 11) Praise the developers for their efforts >> >> *David Gossage* >> *Carousel Checks Inc. | System Administrator* >> *Office* 708.613.2284 >> >> On Thu, Sep 1, 2016 at 2:29 PM, David Gossage < >> [email protected]> wrote: >> >>> On Thu, Sep 1, 2016 at 12:09 AM, Krutika Dhananjay <[email protected]> >>> wrote: >>> >>>> >>>> >>>> On Wed, Aug 31, 2016 at 8:13 PM, David Gossage < >>>> [email protected]> wrote: >>>> >>>>> Just as a test I did not shut down the one VM on the cluster as >>>>> finding a window before weekend where I can shut down all VM's and fit in >>>>> a >>>>> full heal is unlikely so wanted to see what occurs. >>>>> >>>>> >>>>> kill -15 brick pid >>>>> rm -Rf /gluster2/brick1/1 >>>>> mkdir /gluster2/brick1/1 >>>>> mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard >>>>> /fake3 >>>>> setfattr -n "user.some-name" -v "some-value" >>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard >>>>> >>>>> getfattr -d -m . -e hex /gluster2/brick2/1 >>>>> # file: gluster2/brick2/1 >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.afr.dirty=0x000000000000000000000001 >>>>> trusted.afr.glustershard-client-0=0x000000000000000200000000 >>>>> >>>> >>>> This is unusual. The last digit ought to have been 1 on account of >>>> "fake3" being created while hte first brick is offline. >>>> >>>> This discussion is becoming unnecessary lengthy. Mind if we discuss >>>> this and sort it out on IRC today, at least the communication will be >>>> continuous and in real-time. I'm kdhananjay on #gluster (Freenode). Ping me >>>> when you're online. >>>> >>>> -Krutika >>>> >>> >>> Thanks for assistance this morning. Looks like I lost connection in IRC >>> and didn't realize it so sorry if you came back looking for me. Let me >>> know when the steps you worked out have been reviewed and if it's found >>> safe for production use and I'll give a try. >>> >>> >>> >>>> >>>> >>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> getfattr -d -m . -e hex /gluster2/brick3/1 >>>>> # file: gluster2/brick3/1 >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.afr.dirty=0x000000000000000000000001 >>>>> trusted.afr.glustershard-client-0=0x000000000000000200000000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> setfattr -n trusted.afr.glustershard-client-0 -v >>>>> 0x000000010000000200000000 /gluster2/brick2/1 >>>>> setfattr -n trusted.afr.glustershard-client-0 -v >>>>> 0x000000010000000200000000 /gluster2/brick3/1 >>>>> >>>>> getfattr -d -m . -e hex /gluster2/brick3/1/ >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: gluster2/brick3/1/ >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.glustershard-client-0=0x000000010000000200000000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> getfattr -d -m . -e hex /gluster2/brick2/1/ >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: gluster2/brick2/1/ >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.glustershard-client-0=0x000000010000000200000000 >>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> gluster v start glustershard force >>>>> >>>>> gluster heal counts climbed up and down a little as it healed >>>>> everything in visible gluster mount and .glusterfs for visible mount files >>>>> then stalled with around 15 shards and the fake3 directory still in list >>>>> >>>>> getfattr -d -m . -e hex /gluster2/brick2/1/ >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: gluster2/brick2/1/ >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000 >>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> getfattr -d -m . -e hex /gluster2/brick3/1/ >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: gluster2/brick3/1/ >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> getfattr -d -m . -e hex /gluster2/brick1/1/ >>>>> getfattr: Removing leading '/' from absolute path names >>>>> # file: gluster2/brick1/1/ >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> heal count stayed same for awhile then ran >>>>> >>>>> gluster v heal glustershard full >>>>> >>>>> heals jump up to 700 as shards actually get read in as needing heals. >>>>> glustershd shows 3 sweeps started one per brick >>>>> >>>>> It heals shards things look ok heal <> info shows 0 files but >>>>> statistics heal-info shows 1 left for brick 2 and 3. perhaps cause I didnt >>>>> stop vm running? >>>>> >>>>> # file: gluster2/brick1/1/ >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> # file: gluster2/brick2/1/ >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000 >>>>> trusted.afr.glustershard-client-2=0x000000000000000000000000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> # file: gluster2/brick3/1/ >>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>> 23a756e6c6162656c65645f743a733000 >>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>> trusted.afr.glustershard-client-0=0x000000010000000000000000 >>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15 >>>>> user.some-name=0x736f6d652d76616c7565 >>>>> >>>>> meta-data split-brain? heal <> info split-brain shows no files or >>>>> entries. If I had thought ahead I would have checked the values returned >>>>> by getfattr before, although I do know heal-count was returning 0 at the >>>>> time >>>>> >>>>> >>>>> Assuming I need to shut down vm's and put volume in maintenance from >>>>> ovirt to prevent any io. Does it need to occur for whole heal or can I >>>>> re-activate at some point to bring VM's back up? >>>>> >>>>> >>>>> >>>>> >>>>> *David Gossage* >>>>> *Carousel Checks Inc. | System Administrator* >>>>> *Office* 708.613.2284 >>>>> >>>>> On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay < >>>>> [email protected]> wrote: >>>>> >>>>>> No, sorry, it's working fine. I may have missed some step because of >>>>>> which i saw that problem. /.shard is also healing fine now. >>>>>> >>>>>> Let me know if it works for you. >>>>>> >>>>>> -Krutika >>>>>> >>>>>> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> OK I just hit the other issue too, where .shard doesn't get healed. >>>>>>> :) >>>>>>> >>>>>>> Investigating as to why that is the case. Give me some time. >>>>>>> >>>>>>> -Krutika >>>>>>> >>>>>>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> Just figured the steps Anuradha has provided won't work if granular >>>>>>>> entry heal is on. >>>>>>>> So when you bring down a brick and create fake2 under / of the >>>>>>>> volume, granular entry heal feature causes >>>>>>>> sh to remember only the fact that 'fake2' needs to be recreated on >>>>>>>> the offline brick (because changelogs are granular). >>>>>>>> >>>>>>>> In this case, we would be required to indicate to self-heal-daemon >>>>>>>> that the entire directory tree from '/' needs to be repaired on the >>>>>>>> brick >>>>>>>> that contains no data. >>>>>>>> >>>>>>>> To fix this, I did the following (for users who use granular entry >>>>>>>> self-healing): >>>>>>>> >>>>>>>> 1. Kill the last brick process in the replica (/bricks/3) >>>>>>>> >>>>>>>> 2. [root@server-3 ~]# rm -rf /bricks/3 >>>>>>>> >>>>>>>> 3. [root@server-3 ~]# mkdir /bricks/3 >>>>>>>> >>>>>>>> 4. Create a new dir on the mount point: >>>>>>>> [root@client-1 ~]# mkdir /mnt/fake >>>>>>>> >>>>>>>> 5. Set some fake xattr on the root of the volume, and not the >>>>>>>> 'fake' directory itself. >>>>>>>> [root@client-1 ~]# setfattr -n "user.some-name" -v >>>>>>>> "some-value" /mnt >>>>>>>> >>>>>>>> 6. Make sure there's no io happening on your volume. >>>>>>>> >>>>>>>> 7. Check the pending xattrs on the brick directories of the two >>>>>>>> good copies (on bricks 1 and 2), you should be seeing same values as >>>>>>>> the >>>>>>>> one marked in red in both bricks. >>>>>>>> (note that the client-<num> xattr key will have the same last digit >>>>>>>> as the index of the brick that is down, when counting from 0. So if the >>>>>>>> first brick is the one that is down, it would read >>>>>>>> trusted.afr.*-client-0; >>>>>>>> if the second brick is the one that is empty and down, it would read >>>>>>>> trusted.afr.*-client-1 and so on). >>>>>>>> >>>>>>>> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 >>>>>>>> # file: 1 >>>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>>>>> 23a6574635f72756e74696d655f743a733000 >>>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>>> *trusted.afr.rep-client-2=0x000000000000000100000001* >>>>>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>>>>>>> >>>>>>>> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 >>>>>>>> # file: 2 >>>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>>>>> 23a6574635f72756e74696d655f743a733000 >>>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>>> *trusted.afr.rep-client-2=0x000**000000000000100000001* >>>>>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>>>>>>> >>>>>>>> 8. Flip the 8th digit in the trusted.afr.<VOLNAME>-client-2 to a 1. >>>>>>>> >>>>>>>> [root@server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v >>>>>>>> *0x000000010000000100000001* /bricks/1 >>>>>>>> [root@server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v >>>>>>>> *0x000000010000000100000001* /bricks/2 >>>>>>>> >>>>>>>> 9. Get the xattrs again and check the xattrs are set properly now >>>>>>>> >>>>>>>> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1 >>>>>>>> # file: 1 >>>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>>>>> 23a6574635f72756e74696d655f743a733000 >>>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001* >>>>>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>>>>>>> >>>>>>>> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2 >>>>>>>> # file: 2 >>>>>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7 >>>>>>>> 23a6574635f72756e74696d655f743a733000 >>>>>>>> trusted.afr.dirty=0x000000000000000000000000 >>>>>>>> *trusted.afr.rep-client-2=0x000**000010000000100000001* >>>>>>>> trusted.gfid=0x00000000000000000000000000000001 >>>>>>>> trusted.glusterfs.dht=0x000000010000000000000000ffffffff >>>>>>>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b >>>>>>>> >>>>>>>> 10. Force-start the volume. >>>>>>>> >>>>>>>> [root@server-1 ~]# gluster volume start rep force >>>>>>>> volume start: rep: success >>>>>>>> >>>>>>>> 11. Monitor heal-info command to ensure the number of entries keeps >>>>>>>> growing. >>>>>>>> >>>>>>>> 12. Keep monitoring with step 10 and eventually the number of >>>>>>>> entries needing heal must come down to 0. >>>>>>>> Also the checksums of the files on the previously empty brick >>>>>>>> should now match with the copies on the other two bricks. >>>>>>>> >>>>>>>> Could you check if the above steps work for you, in your test >>>>>>>> environment? >>>>>>>> >>>>>>>> You caught a nice bug in the manual steps to follow when granular >>>>>>>> entry-heal is enabled and an empty brick needs heal. Thanks for >>>>>>>> reporting >>>>>>>> it. :) We will fix the documentation appropriately. >>>>>>>> >>>>>>>> -Krutika >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Aug 31, 2016 at 11:29 AM, Krutika Dhananjay < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>>> Tried this. >>>>>>>>> >>>>>>>>> With me, only 'fake2' gets healed after i bring the 'empty' brick >>>>>>>>> back up and it stops there unless I do a 'heal-full'. >>>>>>>>> >>>>>>>>> Is that what you're seeing as well? >>>>>>>>> >>>>>>>>> -Krutika >>>>>>>>> >>>>>>>>> On Wed, Aug 31, 2016 at 4:43 AM, David Gossage < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>>> Same issue brought up glusterd on problem node heal count still >>>>>>>>>> stuck at 6330. >>>>>>>>>> >>>>>>>>>> Ran gluster v heal GUSTER1 full >>>>>>>>>> >>>>>>>>>> glustershd on problem node shows a sweep starting and finishing >>>>>>>>>> in seconds. Other 2 nodes show no activity in log. They should >>>>>>>>>> start a >>>>>>>>>> sweep too shouldn't they? >>>>>>>>>> >>>>>>>>>> Tried starting from scratch >>>>>>>>>> >>>>>>>>>> kill -15 brickpid >>>>>>>>>> rm -Rf /brick >>>>>>>>>> mkdir -p /brick >>>>>>>>>> mkdir mkdir /gsmount/fake2 >>>>>>>>>> setfattr -n "user.some-name" -v "some-value" /gsmount/fake2 >>>>>>>>>> >>>>>>>>>> Heals visible dirs instantly then stops. >>>>>>>>>> >>>>>>>>>> gluster v heal GLUSTER1 full >>>>>>>>>> >>>>>>>>>> see sweep star on problem node and end almost instantly. no >>>>>>>>>> files added t heal list no files healed no more logging >>>>>>>>>> >>>>>>>>>> [2016-08-30 23:11:31.544331] I [MSGID: 108026] >>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>> [2016-08-30 23:11:33.776235] I [MSGID: 108026] >>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>> >>>>>>>>>> same results no matter which node you run command on. Still >>>>>>>>>> stuck with 6330 files showing needing healed out of 19k. still >>>>>>>>>> showing in >>>>>>>>>> logs no heals are occuring. >>>>>>>>>> >>>>>>>>>> Is their a way to forcibly reset any prior heal data? Could it >>>>>>>>>> be stuck on some past failed heal start? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> *David Gossage* >>>>>>>>>> *Carousel Checks Inc. | System Administrator* >>>>>>>>>> *Office* 708.613.2284 >>>>>>>>>> >>>>>>>>>> On Tue, Aug 30, 2016 at 10:03 AM, David Gossage < >>>>>>>>>> [email protected]> wrote: >>>>>>>>>> >>>>>>>>>>> On Tue, Aug 30, 2016 at 10:02 AM, David Gossage < >>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>>> updated test server to 3.8.3 >>>>>>>>>>>> >>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>> cluster.granular-entry-heal: on >>>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>> nfs.addr-namelookup: off >>>>>>>>>>>> nfs.enable-ino32: off >>>>>>>>>>>> cluster.background-self-heal-count: 16 >>>>>>>>>>>> cluster.self-heal-window-size: 1024 >>>>>>>>>>>> performance.quick-read: off >>>>>>>>>>>> performance.io-cache: off >>>>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>>>> cluster.eager-lock: enable >>>>>>>>>>>> network.remote-dio: on >>>>>>>>>>>> cluster.quorum-type: auto >>>>>>>>>>>> cluster.server-quorum-type: server >>>>>>>>>>>> storage.owner-gid: 36 >>>>>>>>>>>> storage.owner-uid: 36 >>>>>>>>>>>> server.allow-insecure: on >>>>>>>>>>>> features.shard: on >>>>>>>>>>>> features.shard-block-size: 64MB >>>>>>>>>>>> performance.strict-o-direct: off >>>>>>>>>>>> cluster.locking-scheme: granular >>>>>>>>>>>> >>>>>>>>>>>> kill -15 brickpid >>>>>>>>>>>> rm -Rf /gluster2/brick3 >>>>>>>>>>>> mkdir -p /gluster2/brick3/1 >>>>>>>>>>>> mkdir mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10 >>>>>>>>>>>> \:_glustershard/fake2 >>>>>>>>>>>> setfattr -n "user.some-name" -v "some-value" >>>>>>>>>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard >>>>>>>>>>>> /fake2 >>>>>>>>>>>> gluster v start glustershard force >>>>>>>>>>>> >>>>>>>>>>>> at this point brick process starts and all visible files >>>>>>>>>>>> including new dir are made on brick >>>>>>>>>>>> handful of shards are in heal statistics still but no .shard >>>>>>>>>>>> directory created and no increase in shard count >>>>>>>>>>>> >>>>>>>>>>>> gluster v heal glustershard >>>>>>>>>>>> >>>>>>>>>>>> At this point still no increase in count or dir made no >>>>>>>>>>>> additional activity in logs for healing generated. waited few >>>>>>>>>>>> minutes >>>>>>>>>>>> tailing logs to check if anything kicked in. >>>>>>>>>>>> >>>>>>>>>>>> gluster v heal glustershard full >>>>>>>>>>>> >>>>>>>>>>>> gluster shards added to list and heal commences. logs show >>>>>>>>>>>> full sweep starting on all 3 nodes. though this time it only >>>>>>>>>>>> shows as >>>>>>>>>>>> finishing on one which looks to be the one that had brick deleted. >>>>>>>>>>>> >>>>>>>>>>>> [2016-08-30 14:45:33.098589] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>> glustershard-client-0 >>>>>>>>>>>> [2016-08-30 14:45:33.099492] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>> glustershard-client-1 >>>>>>>>>>>> [2016-08-30 14:45:33.100093] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>> glustershard-client-2 >>>>>>>>>>>> [2016-08-30 14:52:29.760213] I [MSGID: 108026] >>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>>>>>> glustershard-client-2 >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Just realized its still healing so that may be why sweep on 2 >>>>>>>>>>> other bricks haven't replied as finished. >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> my hope is that later tonight a full heal will work on >>>>>>>>>>>> production. Is it possible self-heal daemon can get stale or stop >>>>>>>>>>>> listening but still show as active? Would stopping and starting >>>>>>>>>>>> self-heal >>>>>>>>>>>> daemon from gluster cli before doing these heals be helpful? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Aug 30, 2016 at 9:29 AM, David Gossage < >>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:52 AM, David Gossage < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 8:01 AM, Krutika Dhananjay < >>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:20 PM, Krutika Dhananjay < >>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 6:07 PM, David Gossage < >>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 7:18 AM, Krutika Dhananjay < >>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Could you also share the glustershd logs? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I'll get them when I get to work sure >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I tried the same steps that you mentioned multiple times, >>>>>>>>>>>>>>>>>> but heal is running to completion without any issues. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> It must be said that 'heal full' traverses the files and >>>>>>>>>>>>>>>>>> directories in a depth-first order and does heals also in >>>>>>>>>>>>>>>>>> the same order. >>>>>>>>>>>>>>>>>> But if it gets interrupted in the middle (say because >>>>>>>>>>>>>>>>>> self-heal-daemon was >>>>>>>>>>>>>>>>>> either intentionally or unintentionally brought offline and >>>>>>>>>>>>>>>>>> then brought >>>>>>>>>>>>>>>>>> back up), self-heal will only pick up the entries that are >>>>>>>>>>>>>>>>>> so far marked as >>>>>>>>>>>>>>>>>> new-entries that need heal which it will find in >>>>>>>>>>>>>>>>>> indices/xattrop directory. >>>>>>>>>>>>>>>>>> What this means is that those files and directories that >>>>>>>>>>>>>>>>>> were not visited >>>>>>>>>>>>>>>>>> during the crawl, will remain untouched and unhealed in this >>>>>>>>>>>>>>>>>> second >>>>>>>>>>>>>>>>>> iteration of heal, unless you execute a 'heal-full' again. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So should it start healing shards as it crawls or not >>>>>>>>>>>>>>>>> until after it crawls the entire .shard directory? At the >>>>>>>>>>>>>>>>> pace it was >>>>>>>>>>>>>>>>> going that could be a week with one node appearing in the >>>>>>>>>>>>>>>>> cluster but with >>>>>>>>>>>>>>>>> no shard files if anything tries to access a file on that >>>>>>>>>>>>>>>>> node. From my >>>>>>>>>>>>>>>>> experience other day telling it to heal full again did >>>>>>>>>>>>>>>>> nothing regardless >>>>>>>>>>>>>>>>> of node used. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Crawl is started from '/' of the volume. Whenever self-heal >>>>>>>>>>>>>>> detects during the crawl that a file or directory is present in >>>>>>>>>>>>>>> some >>>>>>>>>>>>>>> brick(s) and absent in others, it creates the file on the >>>>>>>>>>>>>>> bricks where it >>>>>>>>>>>>>>> is absent and marks the fact that the file or directory might >>>>>>>>>>>>>>> need >>>>>>>>>>>>>>> data/entry and metadata heal too (this also means that an index >>>>>>>>>>>>>>> is created >>>>>>>>>>>>>>> under .glusterfs/indices/xattrop of the src bricks). And the >>>>>>>>>>>>>>> data/entry and >>>>>>>>>>>>>>> metadata heal are picked up and done in >>>>>>>>>>>>>>> >>>>>>>>>>>>>> the background with the help of these indices. >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Looking at my 3rd node as example i find nearly an exact same >>>>>>>>>>>>>> number of files in xattrop dir as reported by heal count at time >>>>>>>>>>>>>> I brought >>>>>>>>>>>>>> down node2 to try and alleviate read io errors that seemed to >>>>>>>>>>>>>> occur from >>>>>>>>>>>>>> what I was guessing as attempts to use the node with no shards >>>>>>>>>>>>>> for reads. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Also attached are the glustershd logs from the 3 nodes, along >>>>>>>>>>>>>> with the test node i tried yesterday with same results. >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Looking at my own logs I notice that a full sweep was only >>>>>>>>>>>>> ever recorded in glustershd.log on 2nd node with missing >>>>>>>>>>>>> directory. I >>>>>>>>>>>>> believe I should have found a sweep begun on every node correct? >>>>>>>>>>>>> >>>>>>>>>>>>> On my test dev when it did work I do see that >>>>>>>>>>>>> >>>>>>>>>>>>> [2016-08-30 13:56:25.223333] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>>> glustershard-client-0 >>>>>>>>>>>>> [2016-08-30 13:56:25.223522] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>>> glustershard-client-1 >>>>>>>>>>>>> [2016-08-30 13:56:25.224616] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>>> 0-glustershard-replicate-0: starting full sweep on subvol >>>>>>>>>>>>> glustershard-client-2 >>>>>>>>>>>>> [2016-08-30 14:18:48.333740] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>>>>>>> glustershard-client-2 >>>>>>>>>>>>> [2016-08-30 14:18:48.356008] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>>>>>>> glustershard-client-1 >>>>>>>>>>>>> [2016-08-30 14:18:49.637811] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>>> 0-glustershard-replicate-0: finished full sweep on subvol >>>>>>>>>>>>> glustershard-client-0 >>>>>>>>>>>>> >>>>>>>>>>>>> While when looking at past few days of the 3 prod nodes i only >>>>>>>>>>>>> found that on my 2nd node >>>>>>>>>>>>> [2016-08-27 01:26:42.638772] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>>> [2016-08-27 11:37:01.732366] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>>> [2016-08-27 12:58:34.597228] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>>> [2016-08-27 12:59:28.041173] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>>> [2016-08-27 20:03:42.560188] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>>> [2016-08-27 20:03:44.278274] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>>> [2016-08-27 21:00:42.603315] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:646:afr_shd_full_healer] >>>>>>>>>>>>> 0-GLUSTER1-replicate-0: starting full sweep on subvol >>>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>>> [2016-08-27 21:00:46.148674] I [MSGID: 108026] >>>>>>>>>>>>> [afr-self-heald.c:656:afr_shd_full_healer] >>>>>>>>>>>>> 0-GLUSTER1-replicate-0: finished full sweep on subvol >>>>>>>>>>>>> GLUSTER1-client-1 >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> My suspicion is that this is what happened on your setup. >>>>>>>>>>>>>>>>>> Could you confirm if that was the case? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Brick was brought online with force start then a full heal >>>>>>>>>>>>>>>>> launched. Hours later after it became evident that it was >>>>>>>>>>>>>>>>> not adding new >>>>>>>>>>>>>>>>> files to heal I did try restarting self-heal daemon and >>>>>>>>>>>>>>>>> relaunching full >>>>>>>>>>>>>>>>> heal again. But this was after the heal had basically already >>>>>>>>>>>>>>>>> failed to >>>>>>>>>>>>>>>>> work as intended. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> OK. How did you figure it was not adding any new files? I >>>>>>>>>>>>>>>> need to know what places you were monitoring to come to this >>>>>>>>>>>>>>>> conclusion. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Krutika >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> As for those logs, I did manager to do something that >>>>>>>>>>>>>>>>>> caused these warning messages you shared earlier to appear >>>>>>>>>>>>>>>>>> in my client and >>>>>>>>>>>>>>>>>> server logs. >>>>>>>>>>>>>>>>>> Although these logs are annoying and a bit scary too, >>>>>>>>>>>>>>>>>> they didn't do any harm to the data in my volume. Why they >>>>>>>>>>>>>>>>>> appear just >>>>>>>>>>>>>>>>>> after a brick is replaced and under no other circumstances >>>>>>>>>>>>>>>>>> is something I'm >>>>>>>>>>>>>>>>>> still investigating. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> But for future, it would be good to follow the steps >>>>>>>>>>>>>>>>>> Anuradha gave as that would allow self-heal to at least >>>>>>>>>>>>>>>>>> detect that it has >>>>>>>>>>>>>>>>>> some repairing to do whenever it is restarted whether >>>>>>>>>>>>>>>>>> intentionally or >>>>>>>>>>>>>>>>>> otherwise. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I followed those steps as described on my test box and >>>>>>>>>>>>>>>>> ended up with exact same outcome of adding shards at an >>>>>>>>>>>>>>>>> agonizing slow pace >>>>>>>>>>>>>>>>> and no creation of .shard directory or heals on shard >>>>>>>>>>>>>>>>> directory. >>>>>>>>>>>>>>>>> Directories visible from mount healed quickly. This was with >>>>>>>>>>>>>>>>> one VM so it >>>>>>>>>>>>>>>>> has only 800 shards as well. After hours at work it had >>>>>>>>>>>>>>>>> added a total of >>>>>>>>>>>>>>>>> 33 shards to be healed. I sent those logs yesterday as well >>>>>>>>>>>>>>>>> though not the >>>>>>>>>>>>>>>>> glustershd. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Does replace-brick command copy files in same manner? For >>>>>>>>>>>>>>>>> these purposes I am contemplating just skipping the heal >>>>>>>>>>>>>>>>> route. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -Krutika >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Tue, Aug 30, 2016 at 2:22 AM, David Gossage < >>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> attached brick and client logs from test machine where >>>>>>>>>>>>>>>>>>> same behavior occurred not sure if anything new is there. >>>>>>>>>>>>>>>>>>> its still on >>>>>>>>>>>>>>>>>>> 3.8.2 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Number of Bricks: 1 x 3 = 3 >>>>>>>>>>>>>>>>>>> Transport-type: tcp >>>>>>>>>>>>>>>>>>> Bricks: >>>>>>>>>>>>>>>>>>> Brick1: 192.168.71.10:/gluster2/brick1/1 >>>>>>>>>>>>>>>>>>> Brick2: 192.168.71.11:/gluster2/brick2/1 >>>>>>>>>>>>>>>>>>> Brick3: 192.168.71.12:/gluster2/brick3/1 >>>>>>>>>>>>>>>>>>> Options Reconfigured: >>>>>>>>>>>>>>>>>>> cluster.locking-scheme: granular >>>>>>>>>>>>>>>>>>> performance.strict-o-direct: off >>>>>>>>>>>>>>>>>>> features.shard-block-size: 64MB >>>>>>>>>>>>>>>>>>> features.shard: on >>>>>>>>>>>>>>>>>>> server.allow-insecure: on >>>>>>>>>>>>>>>>>>> storage.owner-uid: 36 >>>>>>>>>>>>>>>>>>> storage.owner-gid: 36 >>>>>>>>>>>>>>>>>>> cluster.server-quorum-type: server >>>>>>>>>>>>>>>>>>> cluster.quorum-type: auto >>>>>>>>>>>>>>>>>>> network.remote-dio: on >>>>>>>>>>>>>>>>>>> cluster.eager-lock: enable >>>>>>>>>>>>>>>>>>> performance.stat-prefetch: off >>>>>>>>>>>>>>>>>>> performance.io-cache: off >>>>>>>>>>>>>>>>>>> performance.quick-read: off >>>>>>>>>>>>>>>>>>> cluster.self-heal-window-size: 1024 >>>>>>>>>>>>>>>>>>> cluster.background-self-heal-count: 16 >>>>>>>>>>>>>>>>>>> nfs.enable-ino32: off >>>>>>>>>>>>>>>>>>> nfs.addr-namelookup: off >>>>>>>>>>>>>>>>>>> nfs.disable: on >>>>>>>>>>>>>>>>>>> performance.read-ahead: off >>>>>>>>>>>>>>>>>>> performance.readdir-ahead: on >>>>>>>>>>>>>>>>>>> cluster.granular-entry-heal: on >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 2:20 PM, David Gossage < >>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Mon, Aug 29, 2016 at 7:01 AM, Anuradha Talur < >>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>>>>>>>>>> > From: "David Gossage" <[email protected]> >>>>>>>>>>>>>>>>>>>>> > To: "Anuradha Talur" <[email protected]> >>>>>>>>>>>>>>>>>>>>> > Cc: "[email protected] List" < >>>>>>>>>>>>>>>>>>>>> [email protected]>, "Krutika Dhananjay" < >>>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>> > Sent: Monday, August 29, 2016 5:12:42 PM >>>>>>>>>>>>>>>>>>>>> > Subject: Re: [Gluster-users] 3.8.3 Shards Healing >>>>>>>>>>>>>>>>>>>>> Glacier Slow >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > On Mon, Aug 29, 2016 at 5:39 AM, Anuradha Talur < >>>>>>>>>>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > > Response inline. >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > ----- Original Message ----- >>>>>>>>>>>>>>>>>>>>> > > > From: "Krutika Dhananjay" <[email protected]> >>>>>>>>>>>>>>>>>>>>> > > > To: "David Gossage" <[email protected] >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > > > Cc: "[email protected] List" < >>>>>>>>>>>>>>>>>>>>> [email protected]> >>>>>>>>>>>>>>>>>>>>> > > > Sent: Monday, August 29, 2016 3:55:04 PM >>>>>>>>>>>>>>>>>>>>> > > > Subject: Re: [Gluster-users] 3.8.3 Shards >>>>>>>>>>>>>>>>>>>>> Healing Glacier Slow >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > Could you attach both client and brick logs? >>>>>>>>>>>>>>>>>>>>> Meanwhile I will try these >>>>>>>>>>>>>>>>>>>>> > > steps >>>>>>>>>>>>>>>>>>>>> > > > out on my machines and see if it is easily >>>>>>>>>>>>>>>>>>>>> recreatable. >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > -Krutika >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > On Mon, Aug 29, 2016 at 2:31 PM, David Gossage < >>>>>>>>>>>>>>>>>>>>> > > [email protected] >>>>>>>>>>>>>>>>>>>>> > > > > wrote: >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > Centos 7 Gluster 3.8.3 >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > Brick1: ccgl1.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>>>>>>> > > > Brick2: ccgl2.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>>>>>>> > > > Brick3: ccgl4.gl.local:/gluster1/BRICK1/1 >>>>>>>>>>>>>>>>>>>>> > > > Options Reconfigured: >>>>>>>>>>>>>>>>>>>>> > > > cluster.data-self-heal-algorithm: full >>>>>>>>>>>>>>>>>>>>> > > > cluster.self-heal-daemon: on >>>>>>>>>>>>>>>>>>>>> > > > cluster.locking-scheme: granular >>>>>>>>>>>>>>>>>>>>> > > > features.shard-block-size: 64MB >>>>>>>>>>>>>>>>>>>>> > > > features.shard: on >>>>>>>>>>>>>>>>>>>>> > > > performance.readdir-ahead: on >>>>>>>>>>>>>>>>>>>>> > > > storage.owner-uid: 36 >>>>>>>>>>>>>>>>>>>>> > > > storage.owner-gid: 36 >>>>>>>>>>>>>>>>>>>>> > > > performance.quick-read: off >>>>>>>>>>>>>>>>>>>>> > > > performance.read-ahead: off >>>>>>>>>>>>>>>>>>>>> > > > performance.io-cache: off >>>>>>>>>>>>>>>>>>>>> > > > performance.stat-prefetch: on >>>>>>>>>>>>>>>>>>>>> > > > cluster.eager-lock: enable >>>>>>>>>>>>>>>>>>>>> > > > network.remote-dio: enable >>>>>>>>>>>>>>>>>>>>> > > > cluster.quorum-type: auto >>>>>>>>>>>>>>>>>>>>> > > > cluster.server-quorum-type: server >>>>>>>>>>>>>>>>>>>>> > > > server.allow-insecure: on >>>>>>>>>>>>>>>>>>>>> > > > cluster.self-heal-window-size: 1024 >>>>>>>>>>>>>>>>>>>>> > > > cluster.background-self-heal-count: 16 >>>>>>>>>>>>>>>>>>>>> > > > performance.strict-write-ordering: off >>>>>>>>>>>>>>>>>>>>> > > > nfs.disable: on >>>>>>>>>>>>>>>>>>>>> > > > nfs.addr-namelookup: off >>>>>>>>>>>>>>>>>>>>> > > > nfs.enable-ino32: off >>>>>>>>>>>>>>>>>>>>> > > > cluster.granular-entry-heal: on >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > Friday did rolling upgrade from 3.8.3->3.8.3 no >>>>>>>>>>>>>>>>>>>>> issues. >>>>>>>>>>>>>>>>>>>>> > > > Following steps detailed in previous >>>>>>>>>>>>>>>>>>>>> recommendations began proces of >>>>>>>>>>>>>>>>>>>>> > > > replacing and healngbricks one node at a time. >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > 1) kill pid of brick >>>>>>>>>>>>>>>>>>>>> > > > 2) reconfigure brick from raid6 to raid10 >>>>>>>>>>>>>>>>>>>>> > > > 3) recreate directory of brick >>>>>>>>>>>>>>>>>>>>> > > > 4) gluster volume start <> force >>>>>>>>>>>>>>>>>>>>> > > > 5) gluster volume heal <> full >>>>>>>>>>>>>>>>>>>>> > > Hi, >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > I'd suggest that full heal is not used. There are >>>>>>>>>>>>>>>>>>>>> a few bugs in full heal. >>>>>>>>>>>>>>>>>>>>> > > Better safe than sorry ;) >>>>>>>>>>>>>>>>>>>>> > > Instead I'd suggest the following steps: >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > Currently I brought the node down by systemctl >>>>>>>>>>>>>>>>>>>>> stop glusterd as I was >>>>>>>>>>>>>>>>>>>>> > getting sporadic io issues and a few VM's paused so >>>>>>>>>>>>>>>>>>>>> hoping that will help. >>>>>>>>>>>>>>>>>>>>> > I may wait to do this till around 4PM when most work >>>>>>>>>>>>>>>>>>>>> is done in case it >>>>>>>>>>>>>>>>>>>>> > shoots load up. >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > > 1) kill pid of brick >>>>>>>>>>>>>>>>>>>>> > > 2) to configuring of brick that you need >>>>>>>>>>>>>>>>>>>>> > > 3) recreate brick dir >>>>>>>>>>>>>>>>>>>>> > > 4) while the brick is still down, from the mount >>>>>>>>>>>>>>>>>>>>> point: >>>>>>>>>>>>>>>>>>>>> > > a) create a dummy non existent dir under / of >>>>>>>>>>>>>>>>>>>>> mount. >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > so if noee 2 is down brick, pick node for example 3 >>>>>>>>>>>>>>>>>>>>> and make a test dir >>>>>>>>>>>>>>>>>>>>> > under its brick directory that doesnt exist on 2 or >>>>>>>>>>>>>>>>>>>>> should I be dong this >>>>>>>>>>>>>>>>>>>>> > over a gluster mount? >>>>>>>>>>>>>>>>>>>>> You should be doing this over gluster mount. >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > > b) set a non existent extended attribute on / >>>>>>>>>>>>>>>>>>>>> of mount. >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > Could you give me an example of an attribute to >>>>>>>>>>>>>>>>>>>>> set? I've read a tad on >>>>>>>>>>>>>>>>>>>>> > this, and looked up attributes but haven't set any >>>>>>>>>>>>>>>>>>>>> yet myself. >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> Sure. setfattr -n "user.some-name" -v "some-value" >>>>>>>>>>>>>>>>>>>>> <path-to-mount> >>>>>>>>>>>>>>>>>>>>> > Doing these steps will ensure that heal happens only >>>>>>>>>>>>>>>>>>>>> from updated brick to >>>>>>>>>>>>>>>>>>>>> > > down brick. >>>>>>>>>>>>>>>>>>>>> > > 5) gluster v start <> force >>>>>>>>>>>>>>>>>>>>> > > 6) gluster v heal <> >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> > Will it matter if somewhere in gluster the full heal >>>>>>>>>>>>>>>>>>>>> command was run other >>>>>>>>>>>>>>>>>>>>> > day? Not sure if it eventually stops or times out. >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> full heal will stop once the crawl is done. So if you >>>>>>>>>>>>>>>>>>>>> want to trigger heal again, >>>>>>>>>>>>>>>>>>>>> run gluster v heal <>. Actually even brick up or >>>>>>>>>>>>>>>>>>>>> volume start force should >>>>>>>>>>>>>>>>>>>>> trigger the heal. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Did this on test bed today. its one server with 3 >>>>>>>>>>>>>>>>>>>> bricks on same machine so take that for what its worth. >>>>>>>>>>>>>>>>>>>> also it still runs >>>>>>>>>>>>>>>>>>>> 3.8.2. Maybe ill update and re-run test. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> killed brick >>>>>>>>>>>>>>>>>>>> deleted brick dir >>>>>>>>>>>>>>>>>>>> recreated brick dir >>>>>>>>>>>>>>>>>>>> created fake dir on gluster mount >>>>>>>>>>>>>>>>>>>> set suggested fake attribute on it >>>>>>>>>>>>>>>>>>>> ran volume start <> force >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> looked at files it said needed healing and it was just >>>>>>>>>>>>>>>>>>>> 8 shards that were modified for few minutes I ran through >>>>>>>>>>>>>>>>>>>> steps >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> gave it few minutes and it stayed same >>>>>>>>>>>>>>>>>>>> ran gluster volume <> heal >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> it healed all the directories and files you can see >>>>>>>>>>>>>>>>>>>> over mount including fakedir. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> same issue for shards though. it adds more shards to >>>>>>>>>>>>>>>>>>>> heal at glacier pace. slight jump in speed if I stat >>>>>>>>>>>>>>>>>>>> every file and dir in >>>>>>>>>>>>>>>>>>>> VM running but not all shards. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> It started with 8 shards to heal and is now only at 33 >>>>>>>>>>>>>>>>>>>> out of 800 and probably wont finish adding for few days at >>>>>>>>>>>>>>>>>>>> rate it goes. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > > 1st node worked as expected took 12 hours to >>>>>>>>>>>>>>>>>>>>> heal 1TB data. Load was >>>>>>>>>>>>>>>>>>>>> > > little >>>>>>>>>>>>>>>>>>>>> > > > heavy but nothing shocking. >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > About an hour after node 1 finished I began same >>>>>>>>>>>>>>>>>>>>> process on node2. Heal >>>>>>>>>>>>>>>>>>>>> > > > proces kicked in as before and the files in >>>>>>>>>>>>>>>>>>>>> directories visible from >>>>>>>>>>>>>>>>>>>>> > > mount >>>>>>>>>>>>>>>>>>>>> > > > and .glusterfs healed in short time. Then it >>>>>>>>>>>>>>>>>>>>> began crawl of .shard adding >>>>>>>>>>>>>>>>>>>>> > > > those files to heal count at which point the >>>>>>>>>>>>>>>>>>>>> entire proces ground to a >>>>>>>>>>>>>>>>>>>>> > > halt >>>>>>>>>>>>>>>>>>>>> > > > basically. After 48 hours out of 19k shards it >>>>>>>>>>>>>>>>>>>>> has added 5900 to heal >>>>>>>>>>>>>>>>>>>>> > > list. >>>>>>>>>>>>>>>>>>>>> > > > Load on all 3 machnes is negligible. It was >>>>>>>>>>>>>>>>>>>>> suggested to change this >>>>>>>>>>>>>>>>>>>>> > > value >>>>>>>>>>>>>>>>>>>>> > > > to full cluster.data-self-heal-algorithm and >>>>>>>>>>>>>>>>>>>>> restart volume which I >>>>>>>>>>>>>>>>>>>>> > > did. No >>>>>>>>>>>>>>>>>>>>> > > > efffect. Tried relaunching heal no effect, >>>>>>>>>>>>>>>>>>>>> despite any node picked. I >>>>>>>>>>>>>>>>>>>>> > > > started each VM and performed a stat of all >>>>>>>>>>>>>>>>>>>>> files from within it, or a >>>>>>>>>>>>>>>>>>>>> > > full >>>>>>>>>>>>>>>>>>>>> > > > virus scan and that seemed to cause short small >>>>>>>>>>>>>>>>>>>>> spikes in shards added, >>>>>>>>>>>>>>>>>>>>> > > but >>>>>>>>>>>>>>>>>>>>> > > > not by much. Logs are showing no real messages >>>>>>>>>>>>>>>>>>>>> indicating anything is >>>>>>>>>>>>>>>>>>>>> > > going >>>>>>>>>>>>>>>>>>>>> > > > on. I get hits to brick log on occasion of null >>>>>>>>>>>>>>>>>>>>> lookups making me think >>>>>>>>>>>>>>>>>>>>> > > its >>>>>>>>>>>>>>>>>>>>> > > > not really crawling shards directory but waiting >>>>>>>>>>>>>>>>>>>>> for a shard lookup to >>>>>>>>>>>>>>>>>>>>> > > add >>>>>>>>>>>>>>>>>>>>> > > > it. I'll get following in brick log but not >>>>>>>>>>>>>>>>>>>>> constant and sometime >>>>>>>>>>>>>>>>>>>>> > > multiple >>>>>>>>>>>>>>>>>>>>> > > > for same shard. >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478125] W [MSGID: 115009] >>>>>>>>>>>>>>>>>>>>> > > > [server-resolve.c:569:server_resolve] >>>>>>>>>>>>>>>>>>>>> 0-GLUSTER1-server: no resolution >>>>>>>>>>>>>>>>>>>>> > > type >>>>>>>>>>>>>>>>>>>>> > > > for (null) (LOOKUP) >>>>>>>>>>>>>>>>>>>>> > > > [2016-08-29 08:31:57.478170] E [MSGID: 115050] >>>>>>>>>>>>>>>>>>>>> > > > [server-rpc-fops.c:156:server_lookup_cbk] >>>>>>>>>>>>>>>>>>>>> 0-GLUSTER1-server: 12591783: >>>>>>>>>>>>>>>>>>>>> > > > LOOKUP (null) (00000000-0000-0000-00 >>>>>>>>>>>>>>>>>>>>> > > > 00-000000000000/241a55ed-f0d5-4dbc-a6ce-ab784a0ba6ff.221) >>>>>>>>>>>>>>>>>>>>> ==> (Invalid >>>>>>>>>>>>>>>>>>>>> > > > argument) [Invalid argument] >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > This one repeated about 30 times in row then >>>>>>>>>>>>>>>>>>>>> nothing for 10 minutes then >>>>>>>>>>>>>>>>>>>>> > > one >>>>>>>>>>>>>>>>>>>>> > > > hit for one different shard by itself. >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > How can I determine if Heal is actually running? >>>>>>>>>>>>>>>>>>>>> How can I kill it or >>>>>>>>>>>>>>>>>>>>> > > force >>>>>>>>>>>>>>>>>>>>> > > > restart? Does node I start it from determine >>>>>>>>>>>>>>>>>>>>> which directory gets >>>>>>>>>>>>>>>>>>>>> > > crawled to >>>>>>>>>>>>>>>>>>>>> > > > determine heals? >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > David Gossage >>>>>>>>>>>>>>>>>>>>> > > > Carousel Checks Inc. | System Administrator >>>>>>>>>>>>>>>>>>>>> > > > Office 708.613.2284 >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>>>>>>>>> > > > [email protected] >>>>>>>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman >>>>>>>>>>>>>>>>>>>>> /listinfo/gluster-users >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > >>>>>>>>>>>>>>>>>>>>> > > > _______________________________________________ >>>>>>>>>>>>>>>>>>>>> > > > Gluster-users mailing list >>>>>>>>>>>>>>>>>>>>> > > > [email protected] >>>>>>>>>>>>>>>>>>>>> > > > http://www.gluster.org/mailman >>>>>>>>>>>>>>>>>>>>> /listinfo/gluster-users >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > > -- >>>>>>>>>>>>>>>>>>>>> > > Thanks, >>>>>>>>>>>>>>>>>>>>> > > Anuradha. >>>>>>>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>>>>>>> > >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>> Anuradha. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
