Hi, Given that this is an 'OSError', I should probably have said we use Redhat 6.6 64bit and XFS as the brick filesystem.
Has anyone any input on how to troubleshoot this? Thanks, Thibault. On 24 Aug 2015 4:03 pm, "Thibault Godouet" <[email protected]> wrote: > I have had multiple issues with geo-replication. It seems to work OK > initially, the replica gets up to date, and not long after (e.g. a couple > of days), the replication goes into a faulty state and won't get out of it. > > > I have tried a few times now, and last attempt I re-created the slave > volume and setup the replication again. Same symptoms again. > > > > I use Gluster 3.7.3, and you will find my setup and log messages at the > bottom of the email. > > > Any idea what could cause this and how to fix it? > > Thanks, > Thibault. > > ps: my setup and log messages: > > > > Master: > > > > Volume Name: home > Type: Replicate > Volume ID: 2299a204-a1dc-449d-8556-bc65197373c7 > Status: Started > Number of Bricks: 1 x 2 = 2 > Transport-type: tcp > Bricks: > Brick1: server4.uberit.net:/gluster/home-brick-1 > Brick2: server5.uberit.net:/gluster/home-brick-1 > Options Reconfigured: > performance.readdir-ahead: on > geo-replication.indexing: on > geo-replication.ignore-pid-check: on > changelog.changelog: on > > > > Slave: > > Volume Name: homegs > Type: Distribute > Volume ID: 746dfdc3-650d-4468-9fdd-d621dd215b94 > Status: Started > Number of Bricks: 1 > Transport-type: tcp > Bricks: > Brick1: remoteserver1.uberit.net:/gluster/homegs-brick-1/brick > Options Reconfigured: > performance.readdir-ahead: on > > > > The geo-replication status and config (I think I ended up with only > defaults values) are: > > > > # gluster volume geo-replication home ssh://remoteserver1::homegs status > > MASTER NODE MASTER VOL MASTER BRICK SLAVE USER > SLAVE SLAVE NODE STATUS CRAWL STATUS > LAST_SYNCED > > --------------------------------------------------------------------------------------------------------------------------------------------------------- > server5 home /gluster/home-brick-1 root > ssh://remoteserver1::homegs N/A Faulty N/A N/A > server4 home /gluster/home-brick-1 root > ssh://remoteserver1::homegs N/A Faulty N/A N/A > > > > # gluster volume geo-replication home ssh://remoteserver1::homegs config > special_sync_mode: partial > state_socket_unencoded: > /var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.socket > gluster_log_file: > /var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.gluster.log > ssh_command: ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/glusterd/geo-replication/secret.pem > ignore_deletes: false > change_detector: changelog > gluster_command_dir: /usr/sbin/ > georep_session_working_dir: > /var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ > state_file: > /var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.status > remote_gsyncd: /nonexistent/gsyncd > session_owner: 2299a204-a1dc-449d-8556-bc65197373c7 > changelog_log_file: > /var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs-changes.log > socketdir: /var/run/gluster > working_dir: > /var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs > state_detail_file: > /var/lib/glusterd/geo-replication/home_remoteserver1_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs-detail.status > ssh_command_tar: ssh -oPasswordAuthentication=no > -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/tar_ssh.pem > pid_file: > /var/lib/glusterd/geo-replication/home_remoteserver_homegs/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.pid > log_file: > /var/log/glusterfs/geo-replication/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs.log > gluster_params: aux-gfid-mount acl > volume_id: 2299a204-a1dc-449d-8556-bc65197373c7 > > > > The logs look like on the master on server1: > > > > [2015-08-24 15:21:07.955600] I [monitor(monitor):221:monitor] Monitor: > ------------------------------------------------------------ > [2015-08-24 15:21:07.955883] I [monitor(monitor):222:monitor] Monitor: > starting gsyncd worker > [2015-08-24 15:21:08.69528] I [gsyncd(/gluster/home-brick-1):649:main_i] > <top>: syncing: gluster://localhost:home -> > ssh://[email protected]:gluster://localhost:homegs > [2015-08-24 15:21:08.70938] I [changelogagent(agent):75:__init__] > ChangelogAgent: Agent listining... > [2015-08-24 15:21:11.255237] I > [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up xsync > change detection mode > [2015-08-24 15:21:11.255532] I > [master(/gluster/home-brick-1):404:__init__] _GMaster: using 'rsync' as the > sync engine > [2015-08-24 15:21:11.256570] I > [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up > changelog change detection mode > [2015-08-24 15:21:11.256726] I > [master(/gluster/home-brick-1):404:__init__] _GMaster: using 'rsync' as the > sync engine > [2015-08-24 15:21:11.257345] I > [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up > changeloghistory change detection mode > [2015-08-24 15:21:11.257534] I > [master(/gluster/home-brick-1):404:__init__] _GMaster: using 'rsync' as the > sync engine > [2015-08-24 15:21:13.333628] I > [master(/gluster/home-brick-1):1212:register] _GMaster: xsync temp > directory: > /var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40172.18.0.169%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs/62d98e8cc00a34eb85b4fe6d6fd3ba33/xsync > [2015-08-24 15:21:13.333870] I > [resource(/gluster/home-brick-1):1432:service_loop] GLUSTER: Register time: > 1440426073 > [2015-08-24 15:21:13.401132] I > [master(/gluster/home-brick-1):523:crawlwrap] _GMaster: primary master with > volume id 2299a204-a1dc-449d-8556-bc65197373c7 ... > [2015-08-24 15:21:13.412795] I > [master(/gluster/home-brick-1):532:crawlwrap] _GMaster: crawl interval: 1 > seconds > [2015-08-24 15:21:13.427340] I [master(/gluster/home-brick-1):1127:crawl] > _GMaster: starting history crawl... turns: 1, stime: (1440411353, 0) > [2015-08-24 15:21:14.432327] I [master(/gluster/home-brick-1):1156:crawl] > _GMaster: slave's time: (1440411353, 0) > [2015-08-24 15:21:14.890889] E [repce(/gluster/home-brick-1):207:__call__] > RepceClient: call 20960:140215190427392:1440426074.56 (entry_ops) failed on > peer with OSError > [2015-08-24 15:21:14.891124] E > [syncdutils(/gluster/home-brick-1):276:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 165, in > main > main_i() > File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 659, in > main_i > local.service_loop(*[r for r in [remote] if r]) > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1438, > in service_loop > g3.crawlwrap(oneshot=True) > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 584, in > crawlwrap > self.crawl() > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1165, in > crawl > self.changelogs_batch_process(changes) > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1074, in > changelogs_batch_process > self.process(batch) > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 952, in > process > self.process_change(change, done, retry) > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 907, in > process_change > failures = self.slave.server.entry_ops(entries) > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in > __call__ > return self.ins(self.meth, *a) > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in > __call__ > raise res > OSError: [Errno 5] Input/output error > [2015-08-24 15:21:14.892291] I > [syncdutils(/gluster/home-brick-1):220:finalize] <top>: exiting. > [2015-08-24 15:21:14.893665] I [repce(agent):92:service_loop] RepceServer: > terminating on reaching EOF. > [2015-08-24 15:21:14.893879] I [syncdutils(agent):220:finalize] <top>: > exiting. > [2015-08-24 15:21:15.259360] I [monitor(monitor):282:monitor] Monitor: > worker(/gluster/home-brick-1) died in startup phase > > > > and on master server2: > > > > [2015-08-24 15:21:07.650707] I [monitor(monitor):221:monitor] Monitor: > ------------------------------------------------------------ > [2015-08-24 15:21:07.651144] I [monitor(monitor):222:monitor] Monitor: > starting gsyncd worker > [2015-08-24 15:21:07.764817] I [gsyncd(/gluster/home-brick-1):649:main_i] > <top>: syncing: gluster://localhost:home -> > ssh://[email protected]:gluster://localhost:homegs > [2015-08-24 15:21:07.768552] I [changelogagent(agent):75:__init__] > ChangelogAgent: Agent listining... > [2015-08-24 15:21:11.9820] I > [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up xsync > change detection mode > [2015-08-24 15:21:11.10199] I [master(/gluster/home-brick-1):404:__init__] > _GMaster: using 'rsync' as the sync engine > [2015-08-24 15:21:11.10946] I > [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up > changelog change detection mode > [2015-08-24 15:21:11.11115] I [master(/gluster/home-brick-1):404:__init__] > _GMaster: using 'rsync' as the sync engine > [2015-08-24 15:21:11.11744] I > [master(/gluster/home-brick-1):83:gmaster_builder] <top>: setting up > changeloghistory change detection mode > [2015-08-24 15:21:11.11933] I [master(/gluster/home-brick-1):404:__init__] > _GMaster: using 'rsync' as the sync engine > [2015-08-24 15:21:13.59192] I > [master(/gluster/home-brick-1):1212:register] _GMaster: xsync temp > directory: > /var/lib/misc/glusterfsd/home/ssh%3A%2F%2Froot%40IPADDRESS%3Agluster%3A%2F%2F127.0.0.1%3Ahomegs/62d98e8cc00a34eb85b4fe6d6fd3ba33/xsync > [2015-08-24 15:21:13.59454] I > [resource(/gluster/home-brick-1):1432:service_loop] GLUSTER: Register time: > 1440426073 > [2015-08-24 15:21:13.113203] I > [master(/gluster/home-brick-1):523:crawlwrap] _GMaster: primary master with > volume id 2299a204-a1dc-449d-8556-bc65197373c7 ... > [2015-08-24 15:21:13.122018] I > [master(/gluster/home-brick-1):532:crawlwrap] _GMaster: crawl interval: 1 > seconds > [2015-08-24 15:21:23.209912] E [repce(/gluster/home-brick-1):207:__call__] > RepceClient: call 1561:140164806457088:1440426083.11 (keep_alive) failed on > peer with OSError > [2015-08-24 15:21:23.210119] E > [syncdutils(/gluster/home-brick-1):276:log_raise_exception] <top>: FAIL: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 306, > in twrap > tf(*aa) > File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 438, in > keep_alive > cls.slave.server.keep_alive(vi) > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 226, in > __call__ > return self.ins(self.meth, *a) > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 208, in > __call__ > raise res > OSError: [Errno 22] Invalid argument > [2015-08-24 15:21:23.210975] I > [syncdutils(/gluster/home-brick-1):220:finalize] <top>: exiting. > [2015-08-24 15:21:23.212455] I [repce(agent):92:service_loop] RepceServer: > terminating on reaching EOF. > [2015-08-24 15:21:23.212707] I [syncdutils(agent):220:finalize] <top>: > exiting. > [2015-08-24 15:21:24.23336] I [monitor(monitor):282:monitor] Monitor: > worker(/gluster/home-brick-1) died in startup phase > > > > and on the slave (in a different timezone, one hour behind): > > > > [2015-08-24 14:22:02.923098] I [resource(slave):844:service_loop] GLUSTER: > slave listening > [2015-08-24 14:22:07.606774] E [repce(slave):117:worker] <top>: call > failed: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 113, in > worker > res = getattr(self.obj, rmeth)(*in_data[2:]) > File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 731, > in entry_ops > [ESTALE, EINVAL]) > File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line 475, > in errno_wrap > return call(*arg) > File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 78, > in lsetxattr > cls.raise_oserr() > File "/usr/libexec/glusterfs/python/syncdaemon/libcxattr.py", line 37, > in raise_oserr > raise OSError(errn, os.strerror(errn)) > OSError: [Errno 5] Input/output error > [2015-08-24 14:22:07.652092] I [repce(slave):92:service_loop] RepceServer: > terminating on reaching EOF. > [2015-08-24 14:22:07.652364] I [syncdutils(slave):220:finalize] <top>: > exiting. > > > > -------------- > G-RESEARCH believes the information provided herein is reliable. While > every care has been taken to ensure accuracy, the information is furnished > to the recipients with no warranty as to the completeness and accuracy of > its contents and on condition that any errors or omissions shall not be > made the basis of any claim, demand or cause of action. > The information in this email is intended only for the named recipient. > If you are not the intended recipient please notify us immediately and do > not copy, distribute or take action based on this e-mail. > All messages sent to and from this e-mail address will be logged by > G-RESEARCH and are subject to archival storage, monitoring, review and > disclosure. > G-RESEARCH is the trading name of Trenchant Limited, 5th Floor, > Whittington House, 19-30 Alfred Place, London WC1E 7EA. > Trenchant Limited is a company registered in England with company number > 08127121. > -------------- >
_______________________________________________ Gluster-users mailing list [email protected] http://www.gluster.org/mailman/listinfo/gluster-users
