On 07/07/2011 15:25, Kaushik BV wrote:
Hi Chaica,

This primarily means that the RPC communtication between the master
gsyncd module and slave gsyncd module is broken, this could happen to
various reasons. Check if it satisies all the pre-requisites:

- If FUSE is installed in the machine, since Geo-replication module
mounts the GlusterFS volume using FUSE to sync data.
- If the Slave is a volume, check if the volume is started.
- If the Slave is a plain directory, check if the directory has been
created already with the desired permissions (Not applicable in your case)
- If Glusterfs 3.2 is not installed in the default location (in Master)
and has been prefixed to be installed in a custom location, configure
the *gluster-command*  for it to point to exact location.
- If Glusterfs 3.2 is not installed in the default location (in slave)
and has been prefixed to be installed in a custom location, configure
the *remote-gsyncd-command*  for it to point to exact place where gsyncd
  is located.
- locate the slave log and see if it has any anomalies.
- Passwordless SSH is set up properly between the host and the remote
machine ( Not applicable in your case)

Ok the situation has slightly evolved. Now I do have a slave log and clearer error message on the master :


[2011-07-07 19:53:16.258866] I [monitor(monitor):42:monitor] Monitor: ------------------------------------------------------------ [2011-07-07 19:53:16.259073] I [monitor(monitor):43:monitor] Monitor: starting gsyncd worker [2011-07-07 19:53:16.332720] I [gsyncd:286:main_i] <top>: syncing: gluster://localhost:test-volume -> ssh://192.168.1.32::test-volume [2011-07-07 19:53:16.343554] D [repce:131:push] RepceClient: call 6302:140305661662976:1310061196.34 __repce_version__() ... [2011-07-07 19:53:20.931523] D [repce:141:__call__] RepceClient: call 6302:140305661662976:1310061196.34 __repce_version__ -> 1.0 [2011-07-07 19:53:20.932172] D [repce:131:push] RepceClient: call 6302:140305661662976:1310061200.93 version() ... [2011-07-07 19:53:20.933662] D [repce:141:__call__] RepceClient: call 6302:140305661662976:1310061200.93 version -> 1.0 [2011-07-07 19:53:20.933861] D [repce:131:push] RepceClient: call 6302:140305661662976:1310061200.93 pid() ... [2011-07-07 19:53:20.934525] D [repce:141:__call__] RepceClient: call 6302:140305661662976:1310061200.93 pid -> 10075 [2011-07-07 19:53:20.957355] E [syncdutils:131:log_raise_exception] <top>: FAIL:
Traceback (most recent call last):
File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line 102, in main
    main_i()
File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/gsyncd.py", line 293, in main_i
    local.connect()
File "/usr/lib/glusterfs/glusterfs/python/syncdaemon/resource.py", line 379, in connect
    raise RuntimeError("command failed: " + " ".join(argv))
RuntimeError: command failed: /usr/sbin/glusterfs --xlator-option *-dht.assert-no-child-down=true -L DEBUG -l /var/log/glusterfs/geo-replication/test-volume/ssh%3A%2F%2Froot%40192.168.1.32%3Agluster%3A%2F%2F127.0.0.1%3Atest-volume.gluster.log -s localhost --volfile-id test-volume --client-pid=-1 /tmp/gsyncd-aux-mount-hy6T_w [2011-07-07 19:53:20.960621] D [monitor(monitor):58:monitor] Monitor: worker seems to be connected (?? racy check) [2011-07-07 19:53:21.962501] D [monitor(monitor):62:monitor] Monitor: worker died in startup phase

The command launched by glusterfs returns a 255 error shell code, which I belive means the command is terminated by a signal. On the slave log I have :

[2011-07-07 19:54:49.571549] I [fuse-bridge.c:3218:fuse_thread_proc] 0-fuse: unmounting /tmp/gsyncd-aux-mount-z2Q2Hg [2011-07-07 19:54:49.572459] W [glusterfsd.c:712:cleanup_and_exit] (-->/lib/libc.so.6(clone+0x6d) [0x7f2c8998b02d] (-->/lib/libpthread.so.0(+0x68ba) [0x7f2c89c238ba] (-->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xc5) [0x7f2c8a8f51b5]))) 0-: received signum (15), shutting down [2011-07-07 19:54:51.280207] W [write-behind.c:3029:init] 0-test-volume-write-behind: disabling write-behind for first 0 bytes [2011-07-07 19:54:51.291669] I [client.c:1935:notify] 0-test-volume-client-0: parent translators are ready, attempting connect on transport [2011-07-07 19:54:51.292329] I [client.c:1935:notify] 0-test-volume-client-1: parent translators are ready, attempting connect on transport [2011-07-07 19:55:38.582926] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-test-volume-client-0: changing port to 24009 (from 0) [2011-07-07 19:55:38.583456] I [rpc-clnt.c:1531:rpc_clnt_reconfig] 0-test-volume-client-1: changing port to 24009 (from 0)

Bye,
Carl Chenet
_______________________________________________
Gluster-users mailing list
[email protected]
http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

Reply via email to