Looking for feedback / corrections to my logic for trying to preemptively scan 
GlusterFS storage for inconsistent file conditions that may prevent access from 
GlusterFS clients.

This is only targeted to the 3.1.x software releases, though may be applicable 
to earlier and later versions.

Steps to troubleshoot client access problems (reorder when a reasonable process 
has been nailed down)

Check client for:
is gluster-client service running?
are GlusterFS mount points present and accessible
can other files in the same directory be accessed?
do file permissions as presented to the client prohibit client from performing 
whatever access is being attempted?
does lsof show the file, or the directory path to the file in use?
Check backend storage servers for:
file presence on one pair of mirrors, and that if the file exists on the other 
pair of mirrors it is a GlusterFS symlink (perm 0000)
file permissions are consistent across all bricks
file attributes are consistent for all occurrences of the file (unless file is 
a GlusterFS symlink)

Logic for storage server check:
1) find all files with permissions of 0000 (GlusterFS symlink)
2) check extended attribute of each file
3) if attribute is trusted.glusterfs.dht.linkto, lookup indicated bricks and 
servers contained in that replica set (e.g pfs-ro1-replicate-11\000)
4) check to see that actual (normal) file exists on both of those bricks (e.g, 
pfs-ro1-client-22 and pfs-ro1-client-23).
5) if file does NOT exist, log to error file

Possible auto correction steps:
1) if error file exists, process it by removing Gluster LINK files, then
2) copy error file to Gluster client node
3) on that client node, copy missing files from source to native gluster mount

This is the type of tool that would not only be helpful to administrators, but 
would increase confidence in the state of the GlusterFS storage system.

I believe that the development team is working to incorporate some of this type 
of functionality in newer releases of the GlusterFS software - but some of us 
are stuck running what we are currently running until we can make compelling 
arguments for the stability of newer releases.

James Burnash
Unix Engineer
Knight Capital Group




DISCLAIMER:
This e-mail, and any attachments thereto, is intended only for use by the 
addressee(s)named herein and
may contain legally privileged and/or confidential information. If you are not 
the intended recipient of this
e-mail, you are hereby notified that any dissemination, distribution or copying 
of this e-mail and any attachments
thereto, is strictly prohibited. If you have received this in error, please 
immediately notify me and permanently
delete the original and any printout thereof. E-mail transmission cannot be 
guaranteed to be secure or error-free.
The sender therefore does not accept liability for any errors or omissions in 
the contents of this message which
arise as a result of e-mail transmission.
NOTICE REGARDING PRIVACY AND CONFIDENTIALITY
Knight Capital Group may, at its discretion, monitor and review the content of 
all e-mail communications.

http://www.knight.com<http://www.knight.com/>


_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to