Hi Pranith, For the test code, I've made it a tight loop so that the problem shows up straight away, but I'm getting it on my live environment as well. In those scripts, it's only a touch every 2 seconds and a read every 5 seconds. The logs are exactly the same though...
Anyway, it's all at http://bugs.gluster.com/show_bug.cgi?id=3819 now. I have another error that happens on the live servers with the same code, but I haven't been able to reproduce it yet. The mv operation on the updating server fails with: mv: `testfile.tmp' and `testfile' are the same file I can't seem to find any pertinent logs and the problem clears itself up after 60 seconds. I tried turning stat-prefetch off to see if that'd fix it, but no such luck. Can you suggest what I might try to get some useful information? -- Jason Stubbs On 20/11/11 3:18 PM, Pranith Kumar K wrote: > hi Jason, > Could you raise the bug on bugs.gluster.com. > When a file has mismatching gfid, client tries to findout which file is > correct based on the extended attributes of the parent directory, to inspect > those attributes it needs to take an entry lock on that parent directory. The > lock is failing because of touch in a loop, so it gives up and returns the > error EIO. > > Thanks a lot for taking the time to provide the steps to re-create the > problem. > > Pranith. > > On 11/20/2011 02:17 AM, Jason Stubbs wrote: >> (Sorry if this comes through twice, but I sent the original almost 12 hours >> ago and it hasn't >> appeared in the archives even though another mail sent after mine has) >> >> Hi, >> >> I've only been using glusterfs for a couple of weeks, but I've been having a >> few issues with it. >> For one of the issues, I've managed to put together steps to reproduce so I >> guess this is a bug >> report. The log files on the client that experiences the error: >> >> [2011-11-19 18:05:23.619352] W [afr-common.c:1121:afr_conflicting_iattrs] >> 0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 >> (3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6) >> [2011-11-19 18:05:23.619391] W [afr-common.c:1121:afr_conflicting_iattrs] >> 0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 >> (3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6) >> [2011-11-19 18:05:23.619413] W >> [afr-common.c:882:afr_detect_self_heal_by_iatt] 0-testvol-replicate-0: >> /testfile: gfid different on subvolume >> [2011-11-19 18:05:23.619452] I [afr-common.c:1038:afr_launch_self_heal] >> 0-testvol-replicate-0: background missing-entry self-heal triggered. path: >> /testfile >> [2011-11-19 18:05:23.624027] I >> [afr-self-heal-common.c:1858:afr_sh_post_nb_entrylk_conflicting_sh_cbk] >> 0-testvol-replicate-0: Non blocking entrylks failed. >> [2011-11-19 18:05:23.624062] I >> [afr-self-heal-common.c:963:afr_sh_missing_entries_done] >> 0-testvol-replicate-0: split brain found, aborting selfheal of /testfile >> [2011-11-19 18:05:23.624084] E >> [afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] >> 0-testvol-replicate-0: background missing-entry self-heal failed on >> /testfile >> [2011-11-19 18:05:23.624108] W [afr-common.c:1121:afr_conflicting_iattrs] >> 0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 >> (3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6) >> [2011-11-19 18:05:23.624133] W [fuse-bridge.c:184:fuse_entry_cbk] >> 0-glusterfs-fuse: 9142: LOOKUP() /testfile => -1 (Input/output error) >> >> And to reproduce, using two glusterfs (v3.2.5) servers with the following >> volume definition: >> >> Volume Name: testvol >> Type: Replicate >> Status: Started >> Number of Bricks: 2 >> Transport-type: tcp >> Bricks: >> Brick1: 10.104.123.145:/gluster/testvol >> Brick2: 10.82.37.136:/gluster/testvol >> >> Run this on one client: >> >> # while true; do touch testfile.tmp; mv testfile.tmp testfile; done >> >> And this script on another client: >> >> # while true; do x=$(<testfile); done >> >> I couldn't get the error to occur either when both scripts were run on a >> single client, or when >> using the glusterfs servers instead separate clients. Also, it didn't matter >> if both clients were >> mount from the same glusterfs server or one from each of the servers. >> >> My assumption is that the second client's read is being interleaved with the >> first client's move >> operation, giving a differing gfid. If any further information is needed, >> please don't hesitate >> to let me know. >> >> -- >> Jason Stubbs >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list [email protected] http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
