Hi,

Generally, when gsyncd encounters exceptions, it can log the exception and 
restarts. But in some cases, it deadlocks. It happens in my environment about 
once a week. The replication stops, but geo-replication status command shows OK.

I checked the processes in the master. The gsync process hangs in below 
backtrace, and the ssh sub process can’t terminate. I kill the ssh sub process 
use the signal -9 manually, then the geo-replication exits and restarts.

#3 file '/usr/lib64/python2.6/subprocess.py', in '_eintr_retry_call'
#7 file '/usr/lib64/python2.6/subprocess.py', in 'wait'
#11 file '/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py', in 
'log_raise_exception'
#14 file '/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py', in 'twrap'
#19 file '/usr/lib64/python2.6/threading.py', in 'run'
#22 file '/usr/lib64/python2.6/threading.py', in '__bootstrap_inner'
#25 file '/usr/lib64/python2.6/threading.py', in '__bootstrap'


I think the problem is it uses Popen.wait here, which may deadlock if the 
output is larger than the pipe size. See the document 
http://docs.python.org/2/library/subprocess.html, which recommends to use 
Popen.communicate instead.



Thanks.


_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

Reply via email to