On 01/23/2012 10:47 PM, Dan Bretherton wrote:
On 01/18/2012 02:24 AM, Pranith Kumar K wrote:
On 01/17/2012 05:54 PM, Dan Bretherton wrote:
Dear All-
I have been having problems with rebalance ... self-heal again with
Glusterfs version 3.2.5, this time related to "no gfid found"
errors. A fix-layout operation has stalled because errors like the
following are being reported for large number of files.
[2012-01-17 10:48:02.138837] W
[fuse-resolve.c:273:fuse_resolve_deep_cbk] 0-fuse:
/users/mvc/WORK/ORCA1/ORCA1-MV01-DIMGPROC/RUNTMP_Exp61/ORCA1-MV01_2D_y2007m01d05.dimgproc.020:
no gfid found
I thought GFID errors were being fixed in version 3.2.5. How can I
fix these errors to allow rebalance...fix-layout to run normally? I
am also very worried that the lack of GFID entries for files and
directories could stop file replication and other GlusterFS
operations from working properly. All comments and suggestions
would be much appreciated.
Regards
Dan.
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
Dan,
We would like to reproduce this problem in house, could you
give more details on how to get into this situation.
Pranith
Hello Pranith,
The errors were probably the result of a server that became
unresponsive for a few hours and had to be restarted. When the server
was not responding properly it was still showing as Connected in the
output of "gluster peer status", but the load was growing quite large
and it was impossible to log on. I restarted the server and triggered
a self-heal operation on all the volumes in case any files had not
been copied correctly to the unresponsive server. Later on I noticed
some layout related error messages mentioning "anomalies", so I
started a fix-layout operation to correct them. The fix-layout didn't
complete the first time because of "no gfid found" errors, as I
reported to the mailing list. A couple of days later I stopped
fix-layout and started it again on another server, and that time it
ran to completion. I then re-ran the self-heal operation and didn't
find any new layout errors. I don't know if the second fix-layout
attempt worked because it was performed on a different server, or if
the "no gfid found" errors had been corrected automatically by
GlusterFS in the days between the two fix-layout attempts. Either way
I am very relieved, and I apologise for the false alarm. GlusterFS
version 3.2.5 does appear to be able to correct GFID errors
automatically, but this process can take a long time it seems.
Regards
-Dan.
Dan,
Thanks for the information. Our QE was able to reproduce no-gfid
errors bug in the lab. We are looking into the issue. If gfid is not
present it will automatically assign them. The issue is they should not
get into the no-gfid phase at all.
Pranith.
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users