I am running some tests using two kvm hosts each with a centos 6.5 instance 
running gluster 3.4.2.  The gluster instances are acting both as a gluster 
server and client,  mounting the gluster volume they are also serving.  During 
my test there is no file access occurring on the gluster volume.

I am seeing an issue when I forcibly disconnect node1 from the network.  Node2 
can take several minutes before it detects node1 is disconnected.  During this 
time on node2 running "gluster peer status" shows node1 as connected.  The 
first run of "gluster volume status" takes two minutes to timeout and then 
returns with no output.  Subsequent runs of "gluster volume status" returns 
quickly with "Another transaction is in progress. Please try again after 
sometime."  Eventually "gluster peer status" will show node1 as disconnected.  
At that point "gluster volume status" starts to return quickly.

This behavior is only seen when I do a "service network stop" on node1 to 
simulate a node failure. If I do a "service glusterd stop" on node1 to cleanly 
shutdown gluster, node2 sees node1 being disconnected immediately.  The volume 
status commands return immediately.

What is the mechanism for a node to detect a peer has failed?  The delay I am 
seeing is worrisome to deal with in a production environment.

Thanks,
-Joe


System Administration
ARINC Direct
410-266-4028

_______________________________________________
Gluster-users mailing list
[email protected]
http://supercolony.gluster.org/mailman/listinfo/gluster-users

Reply via email to