Re: [Gluster-users] Input/output error on gfid mismatch (with test case) [REPOST]

Pranith Kumar K Sat, 19 Nov 2011 20:54:40 -0800

Let me see if I can reproduce the bug in DEBUG mode, then I may be ableto suggest something.


Pranith


On 11/20/2011 10:14 AM, Jason Stubbs wrote:

Hi Pranith,

For the test code, I've made it a tight loop so that the problem shows up 
straight away, but I'm
getting it on my live environment as well. In those scripts, it's only a touch 
every 2 seconds
and a read every 5 seconds. The logs are exactly the same though...

Anyway, it's all at http://bugs.gluster.com/show_bug.cgi?id=3819 now.

I have another error that happens on the live servers with the same code, but I 
haven't been able
to reproduce it yet. The mv operation on the updating server fails with:

mv: `testfile.tmp' and `testfile' are the same file

I can't seem to find any pertinent logs and the problem clears itself up after 
60 seconds. I tried
turning stat-prefetch off to see if that'd fix it, but no such luck. Can you 
suggest what I might
try to get some useful information?

--
Jason Stubbs


On 20/11/11 3:18 PM, Pranith Kumar K wrote:

hi Jason,
       Could you raise the bug on bugs.gluster.com.
When a file has mismatching gfid, client tries to findout which file is correct 
based on the extended attributes of the parent directory, to inspect those 
attributes it needs to take an entry lock on that parent directory. The lock is 
failing because of touch in a loop, so it gives up and returns the error EIO.

Thanks a lot for taking the time to provide the steps to re-create the problem.

Pranith.

On 11/20/2011 02:17 AM, Jason Stubbs wrote:

(Sorry if this comes through twice, but I sent the original almost 12 hours ago 
and it hasn't
appeared in the archives even though another mail sent after mine has)

Hi,

I've only been using glusterfs for a couple of weeks, but I've been having a 
few issues with it.
For one of the issues, I've managed to put together steps to reproduce so I 
guess this is a bug
report. The log files on the client that experiences the error:

[2011-11-19 18:05:23.619352] W [afr-common.c:1121:afr_conflicting_iattrs] 
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
[2011-11-19 18:05:23.619391] W [afr-common.c:1121:afr_conflicting_iattrs] 
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
[2011-11-19 18:05:23.619413] W [afr-common.c:882:afr_detect_self_heal_by_iatt] 
0-testvol-replicate-0: /testfile: gfid different on subvolume
[2011-11-19 18:05:23.619452] I [afr-common.c:1038:afr_launch_self_heal] 
0-testvol-replicate-0: background  missing-entry self-heal triggered. path: 
/testfile
[2011-11-19 18:05:23.624027] I 
[afr-self-heal-common.c:1858:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 
0-testvol-replicate-0: Non blocking entrylks failed.
[2011-11-19 18:05:23.624062] I 
[afr-self-heal-common.c:963:afr_sh_missing_entries_done] 0-testvol-replicate-0: 
split brain found, aborting selfheal of /testfile
[2011-11-19 18:05:23.624084] E 
[afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] 
0-testvol-replicate-0: background  missing-entry self-heal failed on /testfile
[2011-11-19 18:05:23.624108] W [afr-common.c:1121:afr_conflicting_iattrs] 
0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 
(3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
[2011-11-19 18:05:23.624133] W [fuse-bridge.c:184:fuse_entry_cbk] 
0-glusterfs-fuse: 9142: LOOKUP() /testfile =>   -1 (Input/output error)

And to reproduce, using two glusterfs (v3.2.5) servers with the following 
volume definition:

Volume Name: testvol
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: 10.104.123.145:/gluster/testvol
Brick2: 10.82.37.136:/gluster/testvol

Run this on one client:

# while true; do touch testfile.tmp; mv testfile.tmp testfile; done

And this script on another client:

# while true; do x=$(<testfile); done

I couldn't get the error to occur either when both scripts were run on a single 
client, or when
using the glusterfs servers instead separate clients. Also, it didn't matter if 
both clients were
mount from the same glusterfs server or one from each of the servers.

My assumption is that the second client's read is being interleaved with the 
first client's move
operation, giving a differing gfid. If any further information is needed, 
please don't hesitate
to let me know.

--
Jason Stubbs
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Input/output error on gfid mismatch (with test case) [REPOST]

Reply via email to