Hi Jon, Thanks for reporting. I have logged a bug for this. You can follow it up here
http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=1188 - Anush On Wed, Jul 21, 2010 at 2:14 PM, Jon Swanson <[email protected]> wrote: > Seeing a glusterfs client die oddly. > > --Setup-- > Client: > Fedora 12 2.6.32.16-141.fc12.x86_64 > # rpm -qa |egrep 'fuse|glust' > fuse-2.8.4-1.fc12.x86_64 > glusterfs-client-3.0.5-1.fc11.x86_64 > fuse-libs-2.8.4-1.fc12.x86_64 > glusterfs-common-3.0.5-1.fc11.x86_64 > > > Servers - 6 nodes with a 3 x distribute: > Fedora 12 2.6.32.9-70.fc12.x86_64 > [[email protected] ~]# rpm -qa | grep glust > glusterfs-common-3.0.5-1.fc11.x86_64 > glusterfs-server-3.0.5-1.fc11.x86_64 > > > Process: > 1. Client copies a large amount of files to the gluster mount > 2. Client tries to do a recursive list of all files copied (ls -R) > 3. Recursive list comes across a file where the checksum does not match for > some reason (see following log snipped) > 4. Client dies horribly, the mount point will becoming invalid with the > following error: > gluster-mount/file: Transport endpoint is not connected > > I've tried to keep the snippets below as brief as possible. If you think the > volume definition files would help, let me know and i'll be happy to post > those here as well. > > Any help or suggestions are most welcome. > > Thanks! > > --- > > This is the corresponding snipped from 'tail -f gluster-mount.log': > >> [2010-07-21 16:34:48] N [client-protocol.c:6288:client_setvolume_cbk] >> pdbindex2-1: Connected to 192.168.201.88:6996, attached to remote volume >> 'brick'. > >> [2010-07-21 16:35:33] E [afr.c:107:afr_set_split_brain] mirror-0: invalid >> argument: inode >> [2010-07-21 16:35:33] E [afr-self-heal-algorithm.c:768:sh_diff_checksum_cbk] >> mirror-0: checksum on /index.201007211105.deploy/file failed on subvolume >> indexcopy-0 (File descriptor in bad state) >> [2010-07-21 16:35:33] E [afr-self-heal-algorithm.c:768:sh_diff_checksum_cbk] >> mirror-0: checksum on /index.201007211105.deploy/file failed on subvolume >> indexcopy-1 (File descriptor in bad state) >> pending frames: >> frame : type(1) op(LOOKUP) >> frame : type(1) op(LOOKUP) >> frame : type(1) op(LOOKUP) >> >> patchset: v3.0.5 >> signal received: 11 >> time of crash: 2010-07-21 16:35:33 >> configuration details: >> argp 1 >> backtrace 1 >> dlfcn 1 >> fdatasync 1 >> libpthread 1 >> llistxattr 1 >> setfsid 1 >> spinlock 1 >> epoll.h 1 >> xattr.h 1 >> st_atim.tv_nsec 1 >> package-string: glusterfs 3.0.5 >> /lib64/libc.so.6(+0x32740)[0x7fa9c949b740] >> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(+0x4b2ea)[0x7fa9c85ff2ea] >> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(+0x4b557)[0x7fa9c85ff557] >> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(+0x4be10)[0x7fa9c85ffe10] >> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(afr_sh_algo_diff+0x196)[0x7fa9c85fffc2] >> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(afr_sh_data_sync_prepare+0x256)[0x7fa9c85e9a91] >> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(afr_sh_data_fix+0x5db)[0x7fa9c85ea078] >> /usr/lib64/glusterfs/3.0.5/xlator/cluster/replicate.so(afr_sh_data_fstat_cbk+0x167)[0x7fa9c85ea34e] >> /usr/lib64/glusterfs/3.0.5/xlator/cluster/distribute.so(dht_attr_cbk+0x238)[0x7fa9c8820e08] >> /usr/lib64/glusterfs/3.0.5/xlator/protocol/client.so(client_fstat_cbk+0x178)[0x7fa9c8a59868] >> /usr/lib64/glusterfs/3.0.5/xlator/protocol/client.so(protocol_client_interpret+0x1df)[0x7fa9c8a60274] >> /usr/lib64/glusterfs/3.0.5/xlator/protocol/client.so(protocol_client_pollin+0xc6)[0x7fa9c8a60ff5] >> /usr/lib64/glusterfs/3.0.5/xlator/protocol/client.so(notify+0x158)[0x7fa9c8a6154d] >> /usr/lib64/libglusterfs.so.0(xlator_notify+0xd8)[0x7fa9c9c1b639] >> /usr/lib64/glusterfs/3.0.5/transport/socket.so(socket_event_poll_in+0x46)[0x7fa9c6f59249] >> /usr/lib64/glusterfs/3.0.5/transport/socket.so(socket_event_handler+0xc4)[0x7fa9c6f5957c] >> /usr/lib64/libglusterfs.so.0(+0x3eefc)[0x7fa9c9c40efc] >> /usr/lib64/libglusterfs.so.0(+0x3f0ee)[0x7fa9c9c410ee] >> /usr/lib64/libglusterfs.so.0(event_dispatch+0x74)[0x7fa9c9c4140d] >> /usr/sbin/glusterfs(main+0xf53)[0x406187] >> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fa9c9487b1d] >> /usr/sbin/glusterfs[0x402679] >> --------- > > If we look at the respective files, their checksums are fine: >> [16:40] ~> for i in `seq 10 15`; do echo -n "search$i: "; ssh search$i >> md5sum /data/export/index.201007211105.deploy/file; done >> search10: md5sum: /data/export/index.201007211105.deploy/file: No such file >> or directory >> search11: 8605b1467bece54ed7ccd13e086ee299 >> /data/export/index.201007211105.deploy/file >> search12: md5sum: /data/export/index.201007211105.deploy/file: No such file >> or directory >> search13: md5sum: /data/export/index.201007211105.deploy/file: No such file >> or directory >> search14: 8605b1467bece54ed7ccd13e086ee299 >> /data/export/index.201007211105.deploy/file >> search15: md5sum: /data/export/index.201007211105.deploy/file: No such file >> or directory > > If we look at extended attributes however, we notice that 'trusted.posix.gen' > is different: >> for i in `seq 10 15`; do echo -n "search$i: "; ssh pdbsearch$i getfattr -d >> -m - /data/export/index.201007211105.deploy/file; done >> search10: getfattr: /data/export/index.201007211105.deploy/file: No such >> file or directory >> search11: getfattr: Removing leading '/' from absolute path names >> # file: data/export/index.201007211105.deploy/file >> security.selinux="unconfined_u:object_r:default_t:s0 >> trusted.afr.indexcopy-0=0sAAAAAQAAAAAAAAAA >> trusted.afr.indexcopy-1=0sAAAAAQAAAAAAAAAA >> trusted.posix.gen=0sTEFukQAAAEY= >> >> search12: getfattr: /data/export/index.201007211105.deploy/file: No such >> file or directory >> search13: getfattr: /data/export/index.201007211105.deploy/file: No such >> file or directory >> search14: getfattr: Removing leading '/' from absolute path names >> # file: data/export/index.201007211105.deploy/file >> security.selinux="unconfined_u:object_r:default_t:s0 >> trusted.afr.indexcopy-0=0sAAAAAQAAAAAAAAAA >> trusted.afr.indexcopy-1=0sAAAAAQAAAAAAAAAA >> trusted.posix.gen=0sTEaPaAAAAAI= >> >> search15: getfattr: /data/export/index.201007211105.deploy/file: No such >> file or directory > > > _______________________________________________ > Gluster-users mailing list > [email protected] > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list [email protected] http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
