Thanks for your support,
¹ https://joejulian.name/post/dht-misses-are-expensive/
On Fri, May 31, 2019 at 7:57 AM Ravishankar N <[email protected]
<mailto:[email protected]>> wrote:
On 31/05/19 3:20 AM, Alan Orth wrote:
Dear Ravi,
I spent a bit of time inspecting the xattrs on some files and
directories on a few bricks for this volume and it looks a bit
messy. Even if I could make sense of it for a few and potentially
heal them manually, there are millions of files and directories
in total so that's definitely not a scalable solution. After a
few missteps with `replace-brick ... commit force` in the last
week—one of which on a brick that was dead/offline—as well as
some premature `remove-brick` commands, I'm unsure how how to
proceed and I'm getting demotivated. It's scary how quickly
things get out of hand in distributed systems...
Hi Alan,
The one good thing about gluster is it that the data is always
available directly on the backed bricks even if your volume has
inconsistencies at the gluster level. So theoretically, if your
cluster is FUBAR, you could just create a new volume and copy all
data onto it via its mount from the old volume's bricks.
I had hoped that bringing the old brick back up would help, but
by the time I added it again a few days had passed and all the
brick-id's had changed due to the replace/remove brick commands,
not to mention that the trusted.afr.$volume-client-xx values were
now probably pointing to the wrong bricks (?).
Anyways, a few hours ago I started a full heal on the volume and
I see that there is a sustained 100MiB/sec of network traffic
going from the old brick's host to the new one. The completed
heals reported in the logs look promising too:
Old brick host:
# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o
-E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
281614 Completed data selfheal
84 Completed entry selfheal
299648 Completed metadata selfheal
New brick host:
# grep '2019-05-30' /var/log/glusterfs/glustershd.log | grep -o
-E 'Completed (data|metadata|entry) selfheal' | sort | uniq -c
198256 Completed data selfheal
16829 Completed entry selfheal
229664 Completed metadata selfheal
So that's good I guess, though I have no idea how long it will
take or if it will fix the "missing files" issue on the FUSE
mount. I've increased cluster.shd-max-threads to 8 to hopefully
speed up the heal process.
The afr xattrs should not cause files to disappear from mount. If
the xattr names do not match what each AFR subvol expects (for eg.
in a replica 2 volume, trusted.afr.*-client-{0,1} for 1st subvol,
client-{2,3} for 2nd subvol and so on - ) for its children then it
won't heal the data, that is all. But in your case I see some
inconsistencies like one brick having the actual file
(licenseserver.cfg) and the other having a linkto file (the one
with thedht.linkto xattr) /in the same replica pair/.
I'd be happy for any advice or pointers,
Did you check if the .glusterfs hardlinks/symlinks exist and are
in order for all bricks?
-Ravi
On Wed, May 29, 2019 at 5:20 PM Alan Orth <[email protected]
<mailto:[email protected]>> wrote:
Dear Ravi,
Thank you for the link to the blog post series—it is very
informative and current! If I understand your blog post
correctly then I think the answer to your previous question
about pending AFRs is: no, there are no pending AFRs. I have
identified one file that is a good test case to try to
understand what happened after I issued the `gluster volume
replace-brick ... commit force` a few days ago and then added
the same original brick back to the volume later. This is the
current state of the replica 2 distribute/replicate volume:
[root@wingu0 ~]# gluster volume info apps
Volume Name: apps
Type: Distributed-Replicate
Volume ID: f118d2da-79df-4ee1-919d-53884cd34eda
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: wingu3:/mnt/gluster/apps
Brick2: wingu4:/mnt/gluster/apps
Brick3: wingu05:/data/glusterfs/sdb/apps
Brick4: wingu06:/data/glusterfs/sdb/apps
Brick5: wingu0:/mnt/gluster/apps
Brick6: wingu05:/data/glusterfs/sdc/apps
Options Reconfigured:
diagnostics.client-log-level: DEBUG
storage.health-check-interval: 10
nfs.disable: on
I checked the xattrs of one file that is missing from the
volume's FUSE mount (though I can read it if I access its
full path explicitly), but is present in several of the
volume's bricks (some with full size, others empty):
[root@wingu0 ~]# getfattr -d -m. -e hex
/mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names #
file:
mnt/gluster/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.afr.apps-client-3=0x000000000000000000000000
trusted.afr.apps-client-5=0x000000000000000000000000
trusted.afr.dirty=0x000000000000000000000000
trusted.bit-rot.version=0x0200000000000000585a396f00046e15
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd [root@wingu05
~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names #
file:
data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
[root@wingu05 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names #
file:
data/glusterfs/sdc/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
[root@wingu06 ~]# getfattr -d -m. -e hex
/data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
getfattr: Removing leading '/' from absolute path names #
file:
data/glusterfs/sdb/apps/clcgenomics/clclicsrv/licenseserver.cfg
security.selinux=0x73797374656d5f753a6f626a6563745f723a756e6c6162656c65645f743a733000
trusted.gfid=0x878003a2fb5243b6a0d14d2f8b4306bd
trusted.gfid2path.82586deefbc539c3=0x34666437323861612d356462392d343836382d616232662d6564393031636566333561392f6c6963656e73657365727665722e636667
trusted.glusterfs.dht.linkto=0x617070732d7265706c69636174652d3200
According to the trusted.afr.apps-client-xxxattrs this
particular file should be on bricks with id "apps-client-3"
and "apps-client-5". It took me a few hours to realize that
the brick-id values are recorded in the volume's volfiles in
/var/lib/glusterd/vols/apps/bricks. After comparing those
brick-id values with a volfile backup from before the
replace-brick, I realized that the files are simply on the
wrong brick now as far as Gluster is concerned. This
particular file is now on the brick for "apps-client-4". As
an experiment I copied this one file to the two bricks listed
in the xattrs and I was then able to see the file from the
FUSE mount (yay!).
Other than replacing the brick, removing it, and then adding
the old brick on the original server back, there has been no
change in the data this entire time. Can I change the brick
IDs in the volfiles so they reflect where the data actually
is? Or perhaps script something to reset all the xattrs on
the files/directories to point to the correct bricks?
Thank you for any help or pointers,
On Wed, May 29, 2019 at 7:24 AM Ravishankar N
<[email protected] <mailto:[email protected]>> wrote:
On 29/05/19 9:50 AM, Ravishankar N wrote:
On 29/05/19 3:59 AM, Alan Orth wrote:
Dear Ravishankar,
I'm not sure if Brick4 had pending AFRs because I don't
know what that means and it's been a few days so I am
not sure I would be able to find that information.
When you find some time, have a look at a blog
<http://wp.me/peiBB-6b> series I wrote about AFR- I've
tried to explain what one needs to know to debug
replication related issues in it.
Made a typo error. The URL for the blog is
https://wp.me/peiBB-6b
-Ravi
Anyways, after wasting a few days rsyncing the old
brick to a new host I decided to just try to add the
old brick back into the volume instead of bringing it
up on the new host. I created a new brick directory on
the old host, moved the old brick's contents into that
new directory (minus the .glusterfs directory), added
the new brick to the volume, and then did Vlad's
find/stat trick¹ from the brick to the FUSE mount point.
The interesting problem I have now is that some files
don't appear in the FUSE mount's directory listings,
but I can actually list them directly and even read
them. What could cause that?
Not sure, too many variables in the hacks that you did
to take a guess. You can check if the contents of the
.glusterfs folder are in order on the new brick (example
hardlink for files and symlinks for directories are
present etc.) .
Regards,
Ravi
Thanks,
¹
https://lists.gluster.org/pipermail/gluster-users/2018-February/033584.html
On Fri, May 24, 2019 at 4:59 PM Ravishankar N
<[email protected]
<mailto:[email protected]>> wrote:
On 23/05/19 2:40 AM, Alan Orth wrote:
Dear list,
I seem to have gotten into a tricky situation.
Today I brought up a shiny new server with new
disk arrays and attempted to replace one brick of
a replica 2 distribute/replicate volume on an
older server using the `replace-brick` command:
# gluster volume replace-brick homes
wingu0:/mnt/gluster/homes
wingu06:/data/glusterfs/sdb/homes commit force
The command was successful and I see the new brick
in the output of `gluster volume info`. The
problem is that Gluster doesn't seem to be
migrating the data,
`replace-brick` definitely must heal (not migrate)
the data. In your case, data must have been healed
from Brick-4 to the replaced Brick-3. Are there any
errors in the self-heal daemon logs of Brick-4's
node? Does Brick-4 have pending AFR xattrs blaming
Brick-3? The doc is a bit out of date.
replace-brick command internally does all the
setfattr steps that are mentioned in the doc.
-Ravi
and now the original brick that I replaced is no
longer part of the volume (and a few terabytes of
data are just sitting on the old brick):
# gluster volume info homes | grep -E "Brick[0-9]:"
Brick1: wingu4:/mnt/gluster/homes
Brick2: wingu3:/mnt/gluster/homes
Brick3: wingu06:/data/glusterfs/sdb/homes
Brick4: wingu05:/data/glusterfs/sdb/homes
Brick5: wingu05:/data/glusterfs/sdc/homes
Brick6: wingu06:/data/glusterfs/sdc/homes
I see the Gluster docs have a more complicated
procedure for replacing bricks that involves
getfattr/setfattr¹. How can I tell Gluster about
the old brick? I see that I have a backup of the
old volfile thanks to yum's rpmsave function if
that helps.
We are using Gluster 5.6 on CentOS 7. Thank you
for any advice you can give.
¹
https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Volumes/#replace-faulty-brick
--
Alan Orth
[email protected] <mailto:[email protected]>
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are
missing." ―Friedrich Nietzsche
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Alan Orth
[email protected] <mailto:[email protected]>
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing."
―Friedrich Nietzsche
_______________________________________________
Gluster-users mailing list
[email protected] <mailto:[email protected]>
https://lists.gluster.org/mailman/listinfo/gluster-users
--
Alan Orth
[email protected] <mailto:[email protected]>
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing."
―Friedrich Nietzsche
--
Alan Orth
[email protected] <mailto:[email protected]>
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich
Nietzsche
--
Alan Orth
[email protected] <mailto:[email protected]>
https://picturingjordan.com
https://englishbulgaria.net
https://mjanja.ch
"In heaven all the interesting people are missing." ―Friedrich Nietzsche