Hi Jon,

Thanks for reporting. I have logged a bug for this. You can follow  it up here

http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1188

-
Anush

On Wed, Jul 21, 2010 at 2:14 PM, Jon Swanson
<[email protected]> wrote:
> Seeing a glusterfs client die oddly.
>
> --Setup--
> Client:
> Fedora 12 2.6.32.16-141.fc12.x86_64
> # rpm -qa |egrep 'fuse|glust'
> fuse-2.8.4-1.fc12.x86_64
> glusterfs-client-3.0.5-1.fc11.x86_64
> fuse-libs-2.8.4-1.fc12.x86_64
> glusterfs-common-3.0.5-1.fc11.x86_64
>
>
> Servers - 6 nodes with a 3 x distribute:
> Fedora 12 2.6.32.9-70.fc12.x86_64
> [[email protected] ~]# rpm -qa | grep glust
> glusterfs-common-3.0.5-1.fc11.x86_64
> glusterfs-server-3.0.5-1.fc11.x86_64
>
>
> Process:
> 1. Client copies a large amount of files to the gluster mount
> 2. Client tries to do a recursive list of all files copied (ls -R)
> 3. Recursive list comes across a file where the checksum does not match for 
> some reason (see following log snipped)
> 4. Client dies horribly, the mount point will becoming invalid with the 
> following error:
> gluster-mount/file: Transport endpoint is not connected
>
> I've tried to keep the snippets below as brief as possible.  If you think the 
> volume definition files would help, let me know and i'll be happy to post 
> those here as well.
>
> Any help or suggestions are most welcome.
>
> Thanks!
>
> ---
>
> This is the corresponding snipped from 'tail -f gluster-mount.log':
>
>> [2010-07-21 16:34:48] N [client-protocol.c:6288:client_setvolume_cbk] 
>> pdbindex2-1: Connected to 192.168.201.88:6996, attached to remote volume 
>> 'brick'.
>
>> [2010-07-21 16:35:33] E [afr.c:107:afr_set_split_brain] mirror-0: invalid 
>> argument: inode
>> [2010-07-21 16:35:33] E [afr-self-heal-algorithm.c:768:sh_diff_checksum_cbk] 
>> mirror-0: checksum on /index.201007211105.deploy/file failed on subvolume 
>> indexcopy-0 (File descriptor in bad state)
>> [2010-07-21 16:35:33] E [afr-self-heal-algorithm.c:768:sh_diff_checksum_cbk] 
>> mirror-0: checksum on /index.201007211105.deploy/file failed on subvolume 
>> indexcopy-1 (File descriptor in bad state)
>> pending frames:
>> frame : type(1) op(LOOKUP)
>> frame : type(1) op(LOOKUP)
>> frame : type(1) op(LOOKUP)
>>
>> patchset: v3.0.5
>> signal received: 11
>> time of crash: 2010-07-21 16:35:33
>> configuration details:
>> argp 1
>> backtrace 1
>> dlfcn 1
>> fdatasync 1
>> libpthread 1
>> llistxattr 1
>> setfsid 1
>> spinlock 1
>> epoll.h 1
>> xattr.h 1
>> st_atim.tv_nsec 1
>> package-string: glusterfs 3.0.5
>> /lib64/libc.so.6(+0x32740)[0x7fa9c949b740]
>> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(+0x4b2ea)[0x7fa9c85ff2ea]
>> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(+0x4b557)[0x7fa9c85ff557]
>> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(+0x4be10)[0x7fa9c85ffe10]
>> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(afr_sh_algo_diff+0x196)[0x7fa9c85fffc2]
>> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(afr_sh_data_sync_prepare+0x256)[0x7fa9c85e9a91]
>> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(afr_sh_data_fix+0x5db)[0x7fa9c85ea078]
>> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x167)[0x7fa9c85ea34e]
>> /usr/lib64/glusterfs/3.0.5/xlator/cluster/distribute.so(dht_attr_cbk+0x238)[0x7fa9c8820e08]
>> /usr/lib64/glusterfs/3.0.5/xlator/protocol/client.so(client_fstat_cbk+0x178)[0x7fa9c8a59868]
>> /usr/lib64/glusterfs/3.0.5/xlator/protocol/client.so(protocol_client_interpret+0x1df)[0x7fa9c8a60274]
>> /usr/lib64/glusterfs/3.0.5/xlator/protocol/client.so(protocol_client_pollin+0xc6)[0x7fa9c8a60ff5]
>> /usr/lib64/glusterfs/3.0.5/xlator/protocol/client.so(notify+0x158)[0x7fa9c8a6154d]
>> /usr/lib64/libglusterfs.so.0(xlator_notify+0xd8)[0x7fa9c9c1b639]
>> /usr/lib64/glusterfs/3.0.5/transport/socket.so(socket_event_poll_in+0x46)[0x7fa9c6f59249]
>> /usr/lib64/glusterfs/3.0.5/transport/socket.so(socket_event_handler+0xc4)[0x7fa9c6f5957c]
>> /usr/lib64/libglusterfs.so.0(+0x3eefc)[0x7fa9c9c40efc]
>> /usr/lib64/libglusterfs.so.0(+0x3f0ee)[0x7fa9c9c410ee]
>> /usr/lib64/libglusterfs.so.0(event_dispatch+0x74)[0x7fa9c9c4140d]
>> /usr/sbin/glusterfs(main+0xf53)[0x406187]
>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fa9c9487b1d]
>> /usr/sbin/glusterfs[0x402679]
>> ---------
>
> If we look at the respective files, their checksums are fine:
>> [16:40] ~> for i in `seq 10 15`; do echo -n "search$i: "; ssh search$i 
>> md5sum /data/export/index.201007211105.deploy/file; done
>> search10: md5sum: /data/export/index.201007211105.deploy/file: No such file 
>> or directory
>> search11: 8605b1467bece54ed7ccd13e086ee299  
>> /data/export/index.201007211105.deploy/file
>> search12: md5sum: /data/export/index.201007211105.deploy/file: No such file 
>> or directory
>> search13: md5sum: /data/export/index.201007211105.deploy/file: No such file 
>> or directory
>> search14: 8605b1467bece54ed7ccd13e086ee299  
>> /data/export/index.201007211105.deploy/file
>> search15: md5sum: /data/export/index.201007211105.deploy/file: No such file 
>> or directory
>
> If we look at extended attributes however, we notice that 'trusted.posix.gen' 
> is different:
>> for i in `seq 10 15`; do echo -n "search$i: "; ssh pdbsearch$i getfattr -d 
>> -m - /data/export/index.201007211105.deploy/file; done
>> search10: getfattr: /data/export/index.201007211105.deploy/file: No such 
>> file or directory
>> search11: getfattr: Removing leading '/' from absolute path names
>> # file: data/export/index.201007211105.deploy/file
>> security.selinux="unconfined_u:object_r:default_t:s0
>> trusted.afr.indexcopy-0=0sAAAAAQAAAAAAAAAA
>> trusted.afr.indexcopy-1=0sAAAAAQAAAAAAAAAA
>> trusted.posix.gen=0sTEFukQAAAEY=
>>
>> search12: getfattr: /data/export/index.201007211105.deploy/file: No such 
>> file or directory
>> search13: getfattr: /data/export/index.201007211105.deploy/file: No such 
>> file or directory
>> search14: getfattr: Removing leading '/' from absolute path names
>> # file: data/export/index.201007211105.deploy/file
>> security.selinux="unconfined_u:object_r:default_t:s0
>> trusted.afr.indexcopy-0=0sAAAAAQAAAAAAAAAA
>> trusted.afr.indexcopy-1=0sAAAAAQAAAAAAAAAA
>> trusted.posix.gen=0sTEaPaAAAAAI=
>>
>> search15: getfattr: /data/export/index.201007211105.deploy/file: No such 
>> file or directory
>
>
> _______________________________________________
> Gluster-users mailing list
> [email protected]
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to