GlusterFS Alert -

Problem: GFID Mismatch

Severity: 7 (out of 10) - Loss of service but ultimately no loss of data

PREVENTION: To *prevent* the issue, please install GlusterFS 3.2.2. If you're 
using 3.1.x, upgrade to 3.1.5.
Download 3.2.2 here: http://download.gluster.com/pub/gluster/glusterfs/LATEST/
Download 3.1.5 here: 
http://download.gluster.com/pub/gluster/glusterfs/3.1/LATEST/

FIX:
To check for mismatched GFIDs, please review your client logs and grep for the 
words:
“gfid different”  or “gfid differs”

If you see either of these conditions, simply upgrading will not fix the 
problem. You will need to use our tools here: https://github.com/vikasgorur/gfid
See details below for instructions. Upgrading will not fix the issue if you've 
already experienced GFID mismatches.


DETAILS:
Over the last 3 weeks we have seen a growing number of GlusterFS 
implementations experiencing an issue where mismatched GFIDs are appearing 
within the filesystem.

Each file/directory on a Gluster volume has a unique 128-bit number associated 
with it called the GFID. This is true regardless of Gluster configuration 
(distribute or distribute/replicate). One inode, one GFID. The GFID is stored 
on the backend as the value of the extended attribute "trusted.gfid". Under 
normal circumstances, the value of this attribute is the same on all the 
backend bricks. However, certain conditions can cause the value on one or more 
of the bricks to differ from that on the other bricks. This causes the 
GlusterFS client to become confused and throw errors. This applies to both the 
3.1.4 and 3.2.1 versions of the filesystem, and previous versions in those 
series.  This can happen with the Native GlusterFS, NFS, or CIFS.

PREVENTION:
To prevent this issue from occurring, please upgrade immediately to 3.1.5, or 
3.2.2. This will not correct the issue should it already be present in your 
cluster.

FIX:
***IMPORTANT***
To check for mismatched GFIDs, please review your client logs and grep for the 
words:
“gfid different”  or “gfid differs”

If you see either of these conditions, simply upgrading will not fix the 
problem. You will need to download tools here: 
https://github.com/vikasgorur/gfid

Follow the instructions in the README:
https://github.com/vikasgorur/gfid/blob/master/README

Here's the quick-start version:


1. The first step is to construct the master list of all files:

# cd /export/brick1
# find . > brick1.txt
... (do for all bricks)

# cat brick1.txt brick2.txt... | sort -u > master_list.txt

2. Then we need to get the gfid's of all the inodes from these bricks:

# cd /export/brick1
# gfid-list /path/to/master_list.txt > brick1.gfid
... (do for all bricks)


3. Identify the mismatched inodes:

# gfid-mismatch brick1.gfid brick2.gfid brick3.gfid brick4.gfid

4. Delete the gfid's now by doing:

# gluster volume stop <affected volume>
# gfid-mismatch brick1.gfid brick2.gfid brick3.gfid brick4.gfid | cut -f1 -d: > 
mismatched.txt
# cd /export/brick1
# gfid-delete /path/to/mismatched.txt

Repeat for the other bricks.

5. Check logs

'gfid-delete' will produce a log with one entry for each file, which is either:

usr/bin/factor: removed OK
        OR
usr/bin/vim: No such file or directory

IMPORTANT NOTE: The deletion of gfid's must be done ONLY ON A STOPPED VOLUME.
Deleting the gfid's on a running volume with mounted clients will cause more
problems instead of solving them.

Please feel free to contact me directly with any questions.

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to