Re: [Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

Keith Freedman Wed, 18 Mar 2009 20:01:06 -0700

Andrew, I just want to throw something out there,since I think you're being rather harsh toward the gluster community.

I've found that the developers are very dedicatedand very motivated to solve problems and debug the product.

However, if you're not paying them for support,your right to complain about quick response timesand prioritizing YOUR specific issues diminishesseverely. If you felt you weren't getting properresponses form the gluster-users, it's likelythat none of us had an easy answer for you, orthat maybe no one was really interested insolving that problem because they may not haveseen a parallel to their problems (such is thenature of open-source and community support groups).

If you need to use something in a missioncritical production environment, for heavens sakePAY FOR SUPPORT!. you can't operate missioncritical applications on a "ask anyone around andhope they can help me in time" basis.

The developers are very responsive to the list,but they also have other work to do they get totake time off of work, and the one(s) who cansolve your specific issue may not be checking thelist every minute of every day. (personally, I'mglad, cause they should be working on theproduct, not sifting around waiting for people to submit problems to them).

Again, pay fo support, you'll get dedicatedresources to solve your problem. don't pay andwe as a community will be happy to help when and if we can.


my .02,
Keith

At 06:27 AM 3/18/2009, Andrew McGill wrote:

Replying to myself, since nobody else did: I'moutta here for now. Whether by network errorsor software errors, or plain old stupidity, Ican longer maintain a stable glusterfsmount. Due to the batch nature of theapplication I'm running (rdiff-backup), this isa fatal failure. I've now decommissioned myfailing glusterfs installation. It's a pitythough - although it was horribly slow becauseof inappropriate hardware, it did work (until itflaked out fatally). For decommissioning, itturns out that having your data stored withoutmetadata is a good design choice. Currentlyglusterfs needs some work in terms of recoveringfrom network and server errors. (I think thatthe developers should be made to run it over10Mbps ethernet hubs for a month.) Currentlyglusterfs is a high capacity (given appropriatehardware), but not a high availability solution.On Saturday 14 March 2009 19:12:03 Andrew McGillwrote: > Hello, > > I upgraded fromglusterfs-1.3.12.tar.gz toglusterfs-2.0.0rc4.tar.gz > because I could notcomplete a rdiff-backup without inexplicableerrors. > My efforts have been rewarded with acrash, for which some logs are > displayedbelow. > > The backend is unify, with multipleafr subvolumes of two ext3 volumes > each. > >Here is how the client side of glusterfsdied: > > 2009-03-14 13:32:15 W[afr-self-heal-data.c:798:afr_sh_data_fix]afr4: > Picking favorite child u100-rs1 asauthentic source to resolve conflicting > dataof >/backup5/robbie.foo.co.za/rdiff-backup-data/mirror_metadata.2009-03-14T08:0>0:17+02:00.snapshot.gz 2009-03-14 13:32:15 W >[afr-self-heal-data.c:646:afr_sh_data_open_cbk]afr4: sourcing file >/backup5/robbie.foo.co.za/rdiff-backup-data/mirror_meta> data.2009-03-14T08:00:17+02:00.snapshot.gzfrom u100-rs1 to other sinks > 2009-03-1413:32:22 E [socket.c:102:__socket_rwv] u50-dcc1:readv failed > (Bad address) > 2009-03-1413:32:22 E[socket.c:634:__socket_proto_state_machine]u50-dcc1: > read (Bad address) in state 3(192.168.227.65:6996) > 2009-03-14 13:32:22 E[saved-frames.c:169:saved_frames_unwind]u50-dcc1: > forced unwinding frame type(1)op(READ) > 2009-03-14 13:32:22 E[socket.c:102:__socket_rwv] u50-dr1: readvfailed > (Bad address) > 2009-03-14 13:32:22 E[socket.c:634:__socket_proto_state_machine]u50-dr1: > read (Bad address) in state 3(192.168.227.31:6996) > 2009-03-14 13:32:22 E[saved-frames.c:169:saved_frames_unwind]u50-dr1: > forced unwinding frame type(1)op(READ) > 2009-03-14 13:32:22 E[fuse-bridge.c:1548:fuse_readv_cbk]glusterfs-fuse: > 5998294: READ => -1 (Transportendpoint is not connected) > 2009-03-14 13:33:03E [socket.c:102:__socket_rwv] u50-dr2: readvfailed > (Bad address) > 2009-03-14 13:33:03 E[socket.c:634:__socket_proto_state_machine]u50-dr2: > read (Bad address) in state 3(192.168.227.32:6996) > 2009-03-14 13:33:03 E[saved-frames.c:169:saved_frames_unwind]u50-dr2: > forced unwinding frame type(1)op(READ) > 2009-03-14 13:33:03 E[socket.c:102:__socket_rwv] u50-rs3: readvfailed > (Bad address) > 2009-03-14 13:33:03 E[socket.c:634:__socket_proto_state_machine]u50-rs3: > read (Bad address) in state 3(192.168.227.59:6996) > 2009-03-14 13:33:03 E[saved-frames.c:169:saved_frames_unwind]u50-rs3: > forced unwinding frame type(1)op(READ) > 2009-03-14 13:33:03 E[fuse-bridge.c:1548:fuse_readv_cbk]glusterfs-fuse: > 6006118: READ => -1 (Transportendpoint is not connected) > 2009-03-14 13:33:03E [fuse-bridge.c:1548:fuse_readv_cbk]glusterfs-fuse: > 6006119: READ => -1 (Transportendpoint is not connected) > 2009-03-14 13:33:03E [fuse-bridge.c:1548:fuse_readv_cbk]glusterfs-fuse: > 6006120: READ => -1 (Transportendpoint is not connected) > 2009-03-14 13:33:03E [fuse-bridge.c:1548:fuse_readv_cbk]glusterfs-fuse: > 6006121: READ => -1 (Transportendpoint is not connected) > pending frames: >æ¹» > patchset:cb602a1d7d41587c24379cb2636961ab91446f86 + >signal received: 6 > configuration details:argp1 > backtrace 1 > dlfcn 1 > fdatasync 1 >libpthread 1 > llistxattr 1 > setfsid 1 >spinlock 1 > epoll.h 1 > xattr.h 1 >st_atim.tv_nsec 1 > package-string: glusterfs2.0.0rc4 > [0x381420] >/lib/libc.so.6(abort+0x101)[0xb86451] >/usr/lib/glusterfs/2.0.0rc4/xlator/mount/fuse.so[0x54b9a8]> /lib/libpthread.so.0[0xd302db] >/lib/libc.so.6(clone+0x5e)[0xc2912e] >--------- > > > On the server side, thefollowing messages don't enlighten me, but do >remind me that there was another client runningversion 1.13 still > connecting. It looks likethe server just noticed that the clientdied. > > 2009-03-14 13:30:03 E[socket.c:583:__socket_proto_state_machine]server: > socket header validate failed(192.168.227.167:1023). possible mismatchof > transport-type between server and clientvolumes, or version mismatch > 2009-03-1413:30:03 N [server-protocol.c:8048:notify]server: > 192.168.227.167:1023 disconnected >2009-03-14 13:31:45 E[socket.c:463:__socket_proto_validate_header]server: > socket header signature does not match:O (42.6c.6f) > 2009-03-14 13:31:45 E[socket.c:583:__socket_proto_state_machine]server: > socket header validate failed(192.168.227.167:1023). possible mismatchof > transport-type between server and clientvolumes, or version mismatch > 2009-03-1413:31:45 N [server-protocol.c:8048:notify]server: > 192.168.227.167:1023 disconnected >2009-03-14 13:32:22 E[socket.c:102:__socket_rwv] server: readvfailed > (Connection reset by peer) > 2009-03-1413:32:22 E[socket.c:561:__socket_proto_state_machine]server: > read (Connection reset by peer) instate 1 (192.168.227.5:1020) > 2009-03-1413:32:22 N [server-protocol.c:8048:notify]server: > 192.168.227.5:1020 disconnected >2009-03-14 13:32:22 N[server-protocol.c:7295:mop_setvolume] server: >accepted client from 192.168.227.5:1020 >2009-03-14 13:35:48 N[server-protocol.c:8048:notify] server: >192.168.227.5:1017 disconnected > 2009-03-1413:35:48 N [server-protocol.c:8048:notify]server: > 192.168.227.5:1020 disconnected >2009-03-14 13:35:48 N[server-helpers.c:515:server_connection_destroy]> server: destroyed connection ofbackup5.foo.com-23205-2009/03/14-07:10: >52:777008-u50-dcc1 > > > On another serverbrick, the 25Gb volume u50-dr1-raw was full (itshould > have been 50Gb like its peer). As Irecall, the free space of the second > volume ofAFR which does not get checked for disk space (abug, IMHO). > > It said this, which could haveled to the client-side failure a few minutes >later (the clocks are in sync): > > 2009-03-1413:30:23 W [posix.c:773:posix_mkdir]u50-dr1-raw: mkdir > of/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-backup.tmp.1:No > space left on device > 2009-03-14 13:30:23E [server-protocol.c:3478:server_stub_resume]server: > 1109657: INODELK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1) on u50-dr1 returning error: -1(2) > 2009-03-14 13:30:23 E[server-protocol.c:3478:server_stub_resume]server: > 1109658: INODELK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1) on u50-dr1 returning error: -1(2) > 2009-03-14 13:30:23 E[server-protocol.c:3448:server_stub_resume]server: > 3184942: ENTRYLK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1) on u50-dr1 for key <nul>returning error: -1 (2) > 2009-03-14 13:30:23 E[server-protocol.c:3448:server_stub_resume]server: > 3184943: ENTRYLK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1) on u50-dr1 for key <nul>returning error: -1 (2) > 2009-03-14 13:30:23 E[server-protocol.c:3448:server_stub_resume]server: > 3184947: ENTRYLK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1) on u50-dr1 for key hl returningerror: -1 (2) > 2009-03-14 13:30:23 E[server-protocol.c:3448:server_stub_resume]server: > 3184949: ENTRYLK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1) on u50-dr1 for key hl returningerror: -1 (2) > 2009-03-14 13:30:23 E[server-protocol.c:3478:server_stub_resume]server: > 1109660: INODELK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1/hl) on u50-dr1 returning error:-1 (2) > 2009-03-14 13:30:23 E[server-protocol.c:3478:server_stub_resume]server: > 1109661: INODELK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1/hl) on u50-dr1 returning error:-1 (2) > 2009-03-14 13:30:23 E[server-protocol.c:3448:server_stub_resume]server: > 3184952: ENTRYLK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1/hl) on u50-dr1 for key <nul>returning error: -1 (2) > 2009-03-14 13:30:23 E[server-protocol.c:3448:server_stub_resume]server: > 3184953: ENTRYLK(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1/hl) on u50-dr1 for key <nul>returning error: -1 (2) > 2009-03-14 13:30:23 E[server-protocol.c:2774:server_stub_resume]server: > 1109663: XATTROP(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-> backup.tmp.1) on u50-dr1 returning error: -1(2) > 2009-03-14 13:30:23 E[server-protocol.c:2868:server_stub_resume]server: > 1109665: RMDIR(/backup5/bumblebee.foo.co.za/rdiff-backup-data/rdiff-ba> ckup.tmp.1) on u50-dr1 returning error: -1(2) > 2009-03-14 13:31:45 E[socket.c:463:__socket_proto_validate_header]server: > socket header signature does not match:O (42.6c.6f) > 2009-03-14 13:31:45 E[socket.c:583:__socket_proto_state_machine]server: > socket header validate failed(192.168.227.167:1022). possible mismatchof > transport-type between server and clientvolumes, or version mismatch > 2009-03-1413:31:45 N [server-protocol.c:8048:notify]server: > 192.168.227.167:1022 disconnected >2009-03-14 13:32:22 E[socket.c:102:__socket_rwv] server: readvfailed > (Connection reset by peer) > 2009-03-1413:32:22 E[socket.c:561:__socket_proto_state_machine]server: > read (Connection reset by peer) instate 1 (192.168.227.5:1016) > 2009-03-1413:32:22 N [server-protocol.c:8048:notify]server: > 192.168.227.5:1016 disconnected >2009-03-14 13:32:23 N[server-protocol.c:7295:mop_setvolume] server: >accepted client from 192.168.227.5:1016 > > > Imay have to move the backup in question offglusterfs (if I can just find > the spacesomewhere), since it has taken 4 days to realisethat the backing > up is not just slow, butfaulty. (Of course, if I can't fix it, I'll wina > trip to the data center to install a newmachine to replace the system.) > >_______________________________________________ >Gluster-users mailing list >[email protected] >http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users_______________________________________________Gluster-users mailing list[email protected]http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
[email protected]
http://zresearch.com/cgi-bin/mailman/listinfo/gluster-users

Re: [Gluster-users] Error report: glusterfs2.0rc4 abend -- "readv failed (Bad address) "

Reply via email to