Thanks for sharing! Did it make it any faster after changing network.ping-timeout to 10 secs?
On Wed, Mar 16, 2011 at 11:07 AM, Burnash, James <[email protected]> wrote: > So - answering myself with the (apparent) solution. The configuration IS > correct as shown - the problems were elsewhere. > > Primary cause for this seems to be performing the gluster native client mount > on a virtual machine WITHOUT using the " -O --disable-direct-io-mode" > parameter. > > So I was mounting like this: > > mount -t glusterfs jc1letgfs5:/test-pfs-ro1 /test-pfs2 > > When I should have been doing this: > > mount -t glusterfs -O --disable-direct-io-mode > jc1letgfs5:/test-pfs-ro1 /test-pfs2 > > Secondly, I changed the volume parameter "network.ping-timeout" from its > default of 43 to 10 seconds, in order to get faster recovery from a downed > storage node: > > gluster volume set pfs-rw1 network.ping-timeout 10 > > This configuration now survives the loss of either node of the two storage > server mirrors. There is a noticeable delay before commands on the mount > point complete the first time a command is issued after one of the nodes have > gone done - but then they return at the same speed as when all nodes were > present. > > Thanks especially to all who helped, and Anush who helped me troubleshoot it > from a different angle. > > James Burnash, Unix Engineering > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Burnash, James > Sent: Friday, March 11, 2011 11:31 AM > To: [email protected] > Subject: Re: [Gluster-users] Why does this setup not survive a node crash? > > Could anyone else please take a peek at this an sanity check my > configuration. I'm quite frankly at a loss and tremendously under the gun ... > > Thanks in advance to any kind souls. > > James Burnash, Unix Engineering > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Burnash, James > Sent: Thursday, March 10, 2011 3:55 PM > To: [email protected] > Subject: [Gluster-users] Why does this setup not survive a node crash? > > Perhaps someone will see immediately, given the data below, why this > configuration will not survive a crash of one node - it appears that any node > crashed out of this set will cause gluster native clients to hang until the > node comes back. > > Given (2) initial storage servers (CentOS 5.5, Gluster 3.1.1): > > Starting out by creating a Replicated-Distributed pair with this command: > gluster volume create test-pfs-ro1 replica 2 jc1letgfs5:/export/read-only/g01 > jc1letgfs6:/export/read-only/g01 jc1letgfs5:/export/read-only/g02 > jc1letgfs6:/export/read-only/g02 > > Which ran fine (thought I did not attempt to crash 1 of the pair) > > And then adding (2) more servers, identically configured, with this command: > gluster volume add-brick test-pfs-ro1 jc1letgfs7:/export/read-only/g01 > jc1letgfs8:/export/read-only/g01 jc1letgfs7:/export/read-only/g02 > jc1letgfs8:/export/read-only/g02 > Add Brick successful > > root@jc1letgfs5:~# gluster volume info > > Volume Name: test-pfs-ro1 > Type: Distributed-Replicate > Status: Started > Number of Bricks: 4 x 2 = 8 > Transport-type: tcp > Bricks: > Brick1: jc1letgfs5:/export/read-only/g01 > Brick2: jc1letgfs6:/export/read-only/g01 > Brick3: jc1letgfs5:/export/read-only/g02 > Brick4: jc1letgfs6:/export/read-only/g02 > Brick5: jc1letgfs7:/export/read-only/g01 > Brick6: jc1letgfs8:/export/read-only/g01 > Brick7: jc1letgfs7:/export/read-only/g02 > Brick8: jc1letgfs8:/export/read-only/g02 > > And this volfile info out of the log file > /var/log/glusterfs/etc-glusterd-mount-test-pfs-ro1.log: > > [2011-03-10 14:38:26.310807] W [dict.c:1204:data_to_str] dict: @data=(nil) > Given volfile: > +------------------------------------------------------------------------------+ > 1: volume test-pfs-ro1-client-0 > 2: type protocol/client > 3: option remote-host jc1letgfs5 > 4: option remote-subvolume /export/read-only/g01 > 5: option transport-type tcp > 6: end-volume > 7: > 8: volume test-pfs-ro1-client-1 > 9: type protocol/client > 10: option remote-host jc1letgfs6 > 11: option remote-subvolume /export/read-only/g01 > 12: option transport-type tcp > 13: end-volume > 14: > 15: volume test-pfs-ro1-client-2 > 16: type protocol/client > 17: option remote-host jc1letgfs5 > 18: option remote-subvolume /export/read-only/g02 > 19: option transport-type tcp > 20: end-volume > 21: > 22: volume test-pfs-ro1-client-3 > 23: type protocol/client > 24: option remote-host jc1letgfs6 > 25: option remote-subvolume /export/read-only/g02 > 26: option transport-type tcp > 27: end-volume > 28: > 29: volume test-pfs-ro1-client-4 > 30: type protocol/client > 31: option remote-host jc1letgfs7 > 32: option remote-subvolume /export/read-only/g01 > 33: option transport-type tcp > 34: end-volume > 35: > 36: volume test-pfs-ro1-client-5 > 37: type protocol/client > 38: option remote-host jc1letgfs8 > 39: option remote-subvolume /export/read-only/g01 > 40: option transport-type tcp > 41: end-volume > 42: > 43: volume test-pfs-ro1-client-6 > 44: type protocol/client > 45: option remote-host jc1letgfs7 > 46: option remote-subvolume /export/read-only/g02 > 47: option transport-type tcp > 48: end-volume > 49: > 50: volume test-pfs-ro1-client-7 > 51: type protocol/client > 52: option remote-host jc1letgfs8 > 53: option remote-subvolume /export/read-only/g02 > 54: option transport-type tcp > 55: end-volume > 56: > 57: volume test-pfs-ro1-replicate-0 > 58: type cluster/replicate > 59: subvolumes test-pfs-ro1-client-0 test-pfs-ro1-client-1 > 60: end-volume > 61: > 62: volume test-pfs-ro1-replicate-1 > 63: type cluster/replicate > 64: subvolumes test-pfs-ro1-client-2 test-pfs-ro1-client-3 > 65: end-volume > 66: > 67: volume test-pfs-ro1-replicate-2 > 68: type cluster/replicate > 69: subvolumes test-pfs-ro1-client-4 test-pfs-ro1-client-5 > 70: end-volume > 71: > 72: volume test-pfs-ro1-replicate-3 > 73: type cluster/replicate > 74: subvolumes test-pfs-ro1-client-6 test-pfs-ro1-client-7 > 75: end-volume > 76: > 77: volume test-pfs-ro1-dht > 78: type cluster/distribute > 79: subvolumes test-pfs-ro1-replicate-0 test-pfs-ro1-replicate-1 > test-pfs-ro1-replicate-2 test-pfs-ro1-replicate-3 > 80: end-volume > 81: > 82: volume test-pfs-ro1-write-behind > 83: type performance/write-behind > 84: subvolumes test-pfs-ro1-dht > 85: end-volume > 86: > 87: volume test-pfs-ro1-read-ahead > 88: type performance/read-ahead > 89: subvolumes test-pfs-ro1-write-behind > 90: end-volume > 91: > 92: volume test-pfs-ro1-io-cache > 93: type performance/io-cache > 94: subvolumes test-pfs-ro1-read-ahead > 95: end-volume > 96: > 97: volume test-pfs-ro1-quick-read > 98: type performance/quick-read > 99: subvolumes test-pfs-ro1-io-cache > 100: end-volume > 101: > 102: volume test-pfs-ro1-stat-prefetch > 103: type performance/stat-prefetch > 104: subvolumes test-pfs-ro1-quick-read > 105: end-volume > 106: > 107: volume test-pfs-ro1 > 108: type debug/io-stats > 109: subvolumes test-pfs-ro1-stat-prefetch > 110: end-volume > > Any input would be greatly appreciated. I'm working beyond my deadline > already, and I'm guessing that I'm not seeing the forest for the trees here. > > James Burnash, Unix Engineering > > > DISCLAIMER: > This e-mail, and any attachments thereto, is intended only for use by the > addressee(s) named herein and may contain legally privileged and/or > confidential information. If you are not the intended recipient of this > e-mail, you are hereby notified that any dissemination, distribution or > copying of this e-mail, and any attachments thereto, is strictly prohibited. > If you have received this in error, please immediately notify me and > permanently delete the original and any copy of any e-mail and any printout > thereof. E-mail transmission cannot be guaranteed to be secure or error-free. > The sender therefore does not accept liability for any errors or omissions in > the contents of this message which arise as a result of e-mail transmission. > NOTICE REGARDING PRIVACY AND CONFIDENTIALITY Knight Capital Group may, at its > discretion, monitor and review the content of all e-mail communications. > http://www.knight.com > _______________________________________________ > Gluster-users mailing list > [email protected] > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > [email protected] > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > _______________________________________________ > Gluster-users mailing list > [email protected] > http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list [email protected] http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
