Hi Pranith,

For the test code, I've made it a tight loop so that the problem shows up 
straight away, but I'm
getting it on my live environment as well. In those scripts, it's only a touch 
every 2 seconds
and a read every 5 seconds. The logs are exactly the same though...

Anyway, it's all at http://bugs.gluster.com/show_bug.cgi?id=3819 now.

I have another error that happens on the live servers with the same code, but I 
haven't been able
to reproduce it yet. The mv operation on the updating server fails with:

mv: `testfile.tmp' and `testfile' are the same file

I can't seem to find any pertinent logs and the problem clears itself up after 
60 seconds. I tried
turning stat-prefetch off to see if that'd fix it, but no such luck. Can you 
suggest what I might
try to get some useful information?

--
Jason Stubbs


On 20/11/11 3:18 PM, Pranith Kumar K wrote:
> hi Jason,
>       Could you raise the bug on bugs.gluster.com.
> When a file has mismatching gfid, client tries to findout which file is 
> correct based on the extended attributes of the parent directory, to inspect 
> those attributes it needs to take an entry lock on that parent directory. The 
> lock is failing because of touch in a loop, so it gives up and returns the 
> error EIO.
> 
> Thanks a lot for taking the time to provide the steps to re-create the 
> problem.
> 
> Pranith.
> 
> On 11/20/2011 02:17 AM, Jason Stubbs wrote:
>> (Sorry if this comes through twice, but I sent the original almost 12 hours 
>> ago and it hasn't
>> appeared in the archives even though another mail sent after mine has)
>>
>> Hi,
>>
>> I've only been using glusterfs for a couple of weeks, but I've been having a 
>> few issues with it.
>> For one of the issues, I've managed to put together steps to reproduce so I 
>> guess this is a bug
>> report. The log files on the client that experiences the error:
>>
>> [2011-11-19 18:05:23.619352] W [afr-common.c:1121:afr_conflicting_iattrs] 
>> 0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 
>> (3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
>> [2011-11-19 18:05:23.619391] W [afr-common.c:1121:afr_conflicting_iattrs] 
>> 0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 
>> (3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
>> [2011-11-19 18:05:23.619413] W 
>> [afr-common.c:882:afr_detect_self_heal_by_iatt] 0-testvol-replicate-0: 
>> /testfile: gfid different on subvolume
>> [2011-11-19 18:05:23.619452] I [afr-common.c:1038:afr_launch_self_heal] 
>> 0-testvol-replicate-0: background  missing-entry self-heal triggered. path: 
>> /testfile
>> [2011-11-19 18:05:23.624027] I 
>> [afr-self-heal-common.c:1858:afr_sh_post_nb_entrylk_conflicting_sh_cbk] 
>> 0-testvol-replicate-0: Non blocking entrylks failed.
>> [2011-11-19 18:05:23.624062] I 
>> [afr-self-heal-common.c:963:afr_sh_missing_entries_done] 
>> 0-testvol-replicate-0: split brain found, aborting selfheal of /testfile
>> [2011-11-19 18:05:23.624084] E 
>> [afr-self-heal-common.c:2074:afr_self_heal_completion_cbk] 
>> 0-testvol-replicate-0: background  missing-entry self-heal failed on 
>> /testfile
>> [2011-11-19 18:05:23.624108] W [afr-common.c:1121:afr_conflicting_iattrs] 
>> 0-testvol-replicate-0: /testfile: gfid differs on subvolume 1 
>> (3089007a-da1c-41ad-a111-d1a988de2420, 50eb7bf4-0516-4508-808c-909ac0f968f6)
>> [2011-11-19 18:05:23.624133] W [fuse-bridge.c:184:fuse_entry_cbk] 
>> 0-glusterfs-fuse: 9142: LOOKUP() /testfile =>  -1 (Input/output error)
>>
>> And to reproduce, using two glusterfs (v3.2.5) servers with the following 
>> volume definition:
>>
>> Volume Name: testvol
>> Type: Replicate
>> Status: Started
>> Number of Bricks: 2
>> Transport-type: tcp
>> Bricks:
>> Brick1: 10.104.123.145:/gluster/testvol
>> Brick2: 10.82.37.136:/gluster/testvol
>>
>> Run this on one client:
>>
>> # while true; do touch testfile.tmp; mv testfile.tmp testfile; done
>>
>> And this script on another client:
>>
>> # while true; do x=$(<testfile); done
>>
>> I couldn't get the error to occur either when both scripts were run on a 
>> single client, or when
>> using the glusterfs servers instead separate clients. Also, it didn't matter 
>> if both clients were
>> mount from the same glusterfs server or one from each of the servers.
>>
>> My assumption is that the second client's read is being interleaved with the 
>> first client's move
>> operation, giving a differing gfid. If any further information is needed, 
>> please don't hesitate
>> to let me know.
>>
>> -- 
>> Jason Stubbs
>> _______________________________________________
>> Gluster-users mailing list
>> [email protected]
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
> 

_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to