Dear Ravi, I spent a bit of time inspecting the xattrs on some files and directories on a few bricks for this volume and it looks a bit messy. Even if I could make sense of it for a few and potentially heal them manually, there are millions of files and directories in total so that's definitely not a scalable solution. After a few missteps with `replace-brick ... commit force` in the last week—one of which on a brick that was dead/offline—as well as some premature `remove-brick` commands, I'm unsure how how to proceed and I'm getting demotivated. It's scary how quickly things get out of hand in distributed systems...
I had hoped that bringing the old brick back up would help, but by the time I added it again a few days had passed and all the brick-id's had changed due to the replace/remove brick commands, not to mention that the trusted.afr.$volume-client-xx values were now probably pointing to the wrong bricks (?). Anyways, a few hours ago I started a full heal on the volume and I see that there is a sustained 100MiB/sec of network traffic going from the old brick's host to the new one. The completed heals reported in the logs look promising too: Old brick host: # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c 281614 Completed data selfheal 84 Completed entry selfheal 299648 Completed metadata selfheal New brick host: # grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o -E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c 198256 Completed data selfheal 16829 Completed entry selfheal 229664 Completed metadata selfheal So that's good I guess, though I have no idea how long it will take or if it will fix the "missing files" issue on the FUSE mount. I've increased cluster.shd-max-threads to 8 to hopefully speed up the heal process. I'd be happy for any advice or pointers, On Wed, May 29, 2019 at 5:20 PM Alan Orth <alan.o...@gmail.com> wrote: > Dear Ravi, > > Thank you for the link to the blog post series—it is very informative and > current! If I understand your blog post correctly then I think the answer > to your previous question about pending AFRs is: no, there are no pending > AFRs. I have identified one file that is a good test case to try to > understand what happened after I issued the `gluster volume replace-brick > ... commit force` a few days ago and then added the same original brick > back to the volume later. This is the current state of the replica 2 > distribute/replicate volume: > > [root@wingu0 ~]# gluster volume info apps > > Volume Name: apps > Type: Distributed-Replicate > Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: wingu3:/mnt/gluster/apps > Brick2: wingu4:/mnt/gluster/apps > Brick3: wingu05:/data/glusterfs/sdb/apps > Brick4: wingu06:/data/glusterfs/sdb/apps > Brick5: wingu0:/mnt/gluster/apps > Brick6: wingu05:/data/glusterfs/sdc/apps > Options Reconfigured: > diagnostics.client-log-level: DEBUG > storage.health-check-interval: 10 > nfs.disable: on > > I checked the xattrs of one file that is missing from the volume's FUSE > mount (though I can read it if I access its full path explicitly), but is > present in several of the volume's bricks (some with full size, others > empty): > > [root@wingu0 ~]# getfattr -d -m. -e hex > /mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg > > getfattr: Removing leading '/' from absolute path names > # file: mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.afr.apps-client-3=0x000000000000000000000000 > trusted.afr.apps-client-5=0x000000000000000000000000 > trusted.afr.dirty=0x000000000000000000000000 > trusted.bit-rot.version=0x0200000000000000585a396f00046e15 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > > [root@wingu05 ~]# getfattr -d -m. -e hex > /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 > > [root@wingu05 ~]# getfattr -d -m. -e hex > /data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > > [root@wingu06 ~]# getfattr -d -m. -e hex > /data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > getfattr: Removing leading '/' from absolute path names > # file: data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg > security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000 > trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd > trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667 > trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200 > > According to the trusted.afr.apps-client-xx xattrs this particular file > should be on bricks with id "apps-client-3" and "apps-client-5". It took me > a few hours to realize that the brick-id values are recorded in the > volume's volfiles in /var/lib/glusterd/vols/apps/bricks. After comparing > those brick-id values with a volfile backup from before the replace-brick, > I realized that the files are simply on the wrong brick now as far as > Gluster is concerned. This particular file is now on the brick for > "apps-client-4". As an experiment I copied this one file to the two > bricks listed in the xattrs and I was then able to see the file from the > FUSE mount (yay!). > > Other than replacing the brick, removing it, and then adding the old brick > on the original server back, there has been no change in the data this > entire time. Can I change the brick IDs in the volfiles so they reflect > where the data actually is? Or perhaps script something to reset all the > xattrs on the files/directories to point to the correct bricks? > > Thank you for any help or pointers, > > On Wed, May 29, 2019 at 7:24 AM Ravishankar N <ravishan...@redhat.com> > wrote: > >> >> On 29/05/19 9:50 AM, Ravishankar N wrote: >> >> >> On 29/05/19 3:59 AM, Alan Orth wrote: >> >> Dear Ravishankar, >> >> I'm not sure if Brick4 had pending AFRs because I don't know what that >> means and it's been a few days so I am not sure I would be able to find >> that information. >> >> When you find some time, have a look at a blog <http://wp.me/peiBB-6b> >> series I wrote about AFR- I've tried to explain what one needs to know to >> debug replication related issues in it. >> >> Made a typo error. The URL for the blog is https://wp.me/peiBB-6b >> >> -Ravi >> >> >> Anyways, after wasting a few days rsyncing the old brick to a new host I >> decided to just try to add the old brick back into the volume instead of >> bringing it up on the new host. I created a new brick directory on the old >> host, moved the old brick's contents into that new directory (minus the >> .glusterfs directory), added the new brick to the volume, and then did >> Vlad's find/stat trick¹ from the brick to the FUSE mount point. >> >> The interesting problem I have now is that some files don't appear in the >> FUSE mount's directory listings, but I can actually list them directly and >> even read them. What could cause that? >> >> Not sure, too many variables in the hacks that you did to take a guess. >> You can check if the contents of the .glusterfs folder are in order on the >> new brick (example hardlink for files and symlinks for directories are >> present etc.) . >> Regards, >> Ravi >> >> >> Thanks, >> >> ¹ >> https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html >> >> On Fri, May 24, 2019 at 4:59 PM Ravishankar N <ravishan...@redhat.com> >> wrote: >> >>> >>> On 23/05/19 2:40 AM, Alan Orth wrote: >>> >>> Dear list, >>> >>> I seem to have gotten into a tricky situation. Today I brought up a >>> shiny new server with new disk arrays and attempted to replace one brick of >>> a replica 2 distribute/replicate volume on an older server using the >>> `replace-brick` command: >>> >>> # gluster volume replace-brick homes wingu0:/mnt/gluster/homes >>> wingu06:/data/glusterfs/sdb/homes commit force >>> >>> The command was successful and I see the new brick in the output of >>> `gluster volume info`. The problem is that Gluster doesn't seem to be >>> migrating the data, >>> >>> `replace-brick` definitely must heal (not migrate) the data. In your >>> case, data must have been healed from Brick-4 to the replaced Brick-3. Are >>> there any errors in the self-heal daemon logs of Brick-4's node? Does >>> Brick-4 have pending AFR xattrs blaming Brick-3? The doc is a bit out of >>> date. replace-brick command internally does all the setfattr steps that are >>> mentioned in the doc. >>> >>> -Ravi >>> >>> >>> and now the original brick that I replaced is no longer part of the >>> volume (and a few terabytes of data are just sitting on the old brick): >>> >>> # gluster volume info homes | grep -E "Brick[0-9]:" >>> Brick1: wingu4:/mnt/gluster/homes >>> Brick2: wingu3:/mnt/gluster/homes >>> Brick3: wingu06:/data/glusterfs/sdb/homes >>> Brick4: wingu05:/data/glusterfs/sdb/homes >>> Brick5: wingu05:/data/glusterfs/sdc/homes >>> Brick6: wingu06:/data/glusterfs/sdc/homes >>> >>> I see the Gluster docs have a more complicated procedure for replacing >>> bricks that involves getfattr/setfattr¹. How can I tell Gluster about the >>> old brick? I see that I have a backup of the old volfile thanks to yum's >>> rpmsave function if that helps. >>> >>> We are using Gluster 5.6 on CentOS 7. Thank you for any advice you can >>> give. >>> >>> ¹ >>> https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick >>> >>> -- >>> Alan Orth >>> alan.o...@gmail.com >>> https://picturingjordan.com >>> https://englishbulgaria.net >>> https://mjanja.ch >>> "In heaven all the interesting people are missing." ―Friedrich Nietzsche >>> >>> _______________________________________________ >>> Gluster-users mailing >>> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >>> >>> >> >> -- >> Alan Orth >> alan.o...@gmail.com >> https://picturingjordan.com >> https://englishbulgaria.net >> https://mjanja.ch >> "In heaven all the interesting people are missing." ―Friedrich Nietzsche >> >> >> _______________________________________________ >> Gluster-users mailing >> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users >> >> > > -- > Alan Orth > alan.o...@gmail.com > https://picturingjordan.com > https://englishbulgaria.net > https://mjanja.ch > "In heaven all the interesting people are missing." ―Friedrich Nietzsche > -- Alan Orth alan.o...@gmail.com https://picturingjordan.com https://englishbulgaria.net https://mjanja.ch "In heaven all the interesting people are missing." ―Friedrich Nietzsche
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users