Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

Joe Julian Sat, 13 Jul 2013 17:39:00 -0700

These logs show different results. The results you reported and pastedearlier included, "[2013-07-09 00:59:04.706390] I[afr-common.c:3856:afr_local_init] 0-firewall-scripts-replicate-0: nosubvolumes up", which would produce the "Transport endpoint notconnected" error you reported at first. These results look normal andshould have produced the behavior I described.


42 is The Answer to Life, The Universe, and Everything.

Re-establishing FDs and locks is an expensive operation. Theping-timeout is long because it should not happen, but if there istemporary network congestion you'd (normally) rather have your volumeremain up and pause than have to re-establish everything. Typically,unless you expect your servers to crash often, leaving ping-timeout atthe default is best. YMMV and it's configurable in case you know whatyou're doing and why.



On 07/13/2013 04:58 PM, Greg Scott wrote:

Log files sent privately to Joe. If others from the community want tolook at them, I’m OK with posting them here. I don’t think they haveanything confidential. Now that I know about that 42 second timeout,the behavior makes more sense. Why 42? What’s special about 42?Is there a way I adjust that down for my application to, say, 1 or 2seconds?
-Greg

*From:*Joe Julian [mailto:[email protected]]
*Sent:* Saturday, July 13, 2013 4:28 PM
*To:* Greg Scott; '[email protected]'
*Subject:* Re: [Gluster-users] One node goes offline, the other nodecan't see the replicated volume anymore
Huh.. this was in my sent folder... let's try again.
There's something missing from this picture. The logs show that theclient is connecting to both servers, but it only shows thedisconnection from one and claims that it's not connected to anybricks after that.
Here's the data I'd like to have you generate:

unmount the clients
gluster volume set firewall-scripts diagnostics.client-log-level DEBUG
gluster volume set firewall-scripts diagnostics.brick-log-level DEBUG
systemctl stop glusterd.service
truncate the client, glusterd, and server logs
systemctl start glusterd
mount /firewall-scripts
Do your iptables disconnect
telnet $this_host_ip 24007 # report whether or not it establishes aconnection
ls /firewall-scripts
wait 42 seconds
ls /firewall-scripts
Remove the iptables rule
ls /firewall-scripts
tar up the logs and email them to me.

You can reset the log-level:

gluster volume reset firewall-scripts diagnostics.client-log-level
gluster volume reset firewall-scripts diagnostics.brick-log-level
lastly, do you have a loopback interface (lo) on 127.0.0.1 and islocalhost defined in /etc/hosts?

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] One node goes offline, the other node can't see the replicated volume anymore

Reply via email to