Hi Andy, This is a issue with non-root geo-rep session and is not fixed yet. Could you please raise a bug for this issue?
Thanks, Kotresh HR On Wed, Mar 20, 2019 at 11:53 AM Andy Coates <[email protected]> wrote: > We're seeing the same permission denied errors when running as a non-root > geosync user. > > Does anyone know what the underlying issue is? > > On Wed, 26 Sep 2018 at 00:40, Kotte, Christian (Ext) < > [email protected]> wrote: > >> I changed to replication to use the root user and re-created the >> replication with “create force”. Now the files and folders were replicated, >> and the permission denied, and New folder error disappeared, but old files >> are not deleted. >> >> >> >> Looks like the history crawl is in some kind of a loop: >> >> >> >> [root@master ~]# gluster volume geo-replication status >> >> >> >> MASTER NODE MASTER VOL MASTER >> BRICK SLAVE USER SLAVE >> SLAVE >> NODE STATUS CRAWL STATUS LAST_SYNCED >> >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ >> >> master glustervol1 >> /bricks/brick1/brick root ssh://slave_1::glustervol1 >> slave_1 Active >> Hybrid Crawl N/A >> >> master glustervol1 >> /bricks/brick1/brick root ssh://slave_2::glustervol1 >> slave_2 Active >> Hybrid Crawl N/A >> >> master glustervol1 >> /bricks/brick1/brick root ssh://slave_3::glustervol1 >> slave_3 Active >> Hybrid Crawl N/A >> >> >> >> tail -f >> /var/log/glusterfs/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.log >> >> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line >> 104, in cl_history_changelog >> >> raise ChangelogHistoryNotAvailable() >> >> ChangelogHistoryNotAvailable >> >> [2018-09-25 14:10:44.196011] E [repce(worker >> /bricks/brick1/brick):197:__call__] RepceClient: call failed >> call=29945:139700517484352:1537884644.19 method=history >> error=ChangelogHistoryNotAvailable >> >> [2018-09-25 14:10:44.196405] I [resource(worker >> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not >> available, using xsync >> >> [2018-09-25 14:10:44.221385] I [master(worker >> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl >> stime=(0, 0) >> >> [2018-09-25 14:10:44.223382] I [gsyncdstatus(worker >> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >> Status Change status=Hybrid Crawl >> >> [2018-09-25 14:10:46.225296] I [master(worker >> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog >> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537884644 >> >> [2018-09-25 14:13:36.157408] I [gsyncd(config-get):297:main] <top>: Using >> session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:13:36.377880] I [gsyncd(status):297:main] <top>: Using >> session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:31:10.145035] I [master(worker >> /bricks/brick1/brick):1944:syncjob] Syncer: Sync Time Taken >> duration=1212.5316 num_files=1474 job=2 return_code=11 >> >> [2018-09-25 14:31:10.152637] E [syncdutils(worker >> /bricks/brick1/brick):801:errlog] Popen: command returned error cmd=rsync >> -aR0 --inplace --files-from=- --super --stats --numeric-ids >> --no-implied-dirs --xattrs --acls . -e ssh -oPasswordAuthentication=no >> -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem >> -p 22 -oControlMaster=auto -S >> /tmp/gsyncd-aux-ssh-gg758Z/caec4d1d03cc28bc1853f692e291164f.sock >> slave_3:/proc/15919/cwd error=11 >> >> [2018-09-25 14:31:10.237371] I [repce(agent >> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching >> EOF. >> >> [2018-09-25 14:31:10.430820] I >> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status >> Change status=Faulty >> >> [2018-09-25 14:31:20.541475] I [monitor(monitor):158:monitor] Monitor: >> starting gsyncd worker brick=/bricks/brick1/brick slave_node=slave_3 >> >> [2018-09-25 14:31:20.806518] I [gsyncd(agent >> /bricks/brick1/brick):297:main] <top>: Using session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:31:20.816536] I [changelogagent(agent >> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... >> >> [2018-09-25 14:31:20.821574] I [gsyncd(worker >> /bricks/brick1/brick):297:main] <top>: Using session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:31:20.882128] I [resource(worker >> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection >> between master and slave... >> >> [2018-09-25 14:31:24.169857] I [resource(worker >> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between >> master and slave established. duration=3.2873 >> >> [2018-09-25 14:31:24.170401] I [resource(worker >> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-25 14:31:25.354633] I [resource(worker >> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >> duration=1.1839 >> >> [2018-09-25 14:31:25.355073] I [subcmds(worker >> /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. >> Acknowledging back to monitor >> >> [2018-09-25 14:31:27.439034] I [master(worker >> /bricks/brick1/brick):1593:register] _GMaster: Working dir >> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick >> >> [2018-09-25 14:31:27.441847] I [resource(worker >> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time >> time=1537885887 >> >> [2018-09-25 14:31:27.465053] I [gsyncdstatus(worker >> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status >> Change status=Active >> >> [2018-09-25 14:31:27.471021] I [gsyncdstatus(worker >> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >> Status Change status=History Crawl >> >> [2018-09-25 14:31:27.471484] I [master(worker >> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl >> turns=1 stime=(0, 0) entry_stime=None etime=1537885887 >> >> [2018-09-25 14:31:27.472564] E [repce(agent >> /bricks/brick1/brick):105:worker] <top>: call failed: >> >> Traceback (most recent call last): >> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in >> worker >> >> res = getattr(self.obj, rmeth)(*in_data[2:]) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line >> 53, in history >> >> num_parallel) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line >> 104, in cl_history_changelog >> >> raise ChangelogHistoryNotAvailable() >> >> ChangelogHistoryNotAvailable >> >> [2018-09-25 14:31:27.480632] E [repce(worker >> /bricks/brick1/brick):197:__call__] RepceClient: call failed >> call=31250:140272364406592:1537885887.47 method=history >> error=ChangelogHistoryNotAvailable >> >> [2018-09-25 14:31:27.480958] I [resource(worker >> /bricks/brick1/brick):1295:service_loop] GLUSTER: Changelog history not >> available, using xsync >> >> [2018-09-25 14:31:27.495117] I [master(worker >> /bricks/brick1/brick):1623:crawl] _GMaster: starting hybrid crawl >> stime=(0, 0) >> >> [2018-09-25 14:31:27.502083] I [gsyncdstatus(worker >> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >> Status Change status=Hybrid Crawl >> >> [2018-09-25 14:31:29.505284] I [master(worker >> /bricks/brick1/brick):1634:crawl] _GMaster: processing xsync changelog >> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick/xsync/XSYNC-CHANGELOG.1537885887 >> >> >> >> tail -f >> /var/log/glusterfs/geo-replication-slaves/glustervol1_slave_3_glustervol1/gsyncd.log >> >> [2018-09-25 13:49:24.141303] I [repce(slave >> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-25 13:49:36.602051] W [gsyncd(slave >> master/bricks/brick1/brick):293:main] <top>: Session config file not >> exists, using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 13:49:36.629415] I [resource(slave >> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-25 13:49:37.701642] I [resource(slave >> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >> volume duration=1.0718 >> >> [2018-09-25 13:49:37.704282] I [resource(slave >> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> [2018-09-25 14:10:27.70952] I [repce(slave >> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-25 14:10:39.632124] W [gsyncd(slave >> master/bricks/brick1/brick):293:main] <top>: Session config file not >> exists, using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:10:39.650958] I [resource(slave >> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-25 14:10:40.729355] I [resource(slave >> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >> volume duration=1.0781 >> >> [2018-09-25 14:10:40.730650] I [resource(slave >> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> [2018-09-25 14:31:10.291064] I [repce(slave >> master/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-25 14:31:22.802237] W [gsyncd(slave >> master/bricks/brick1/brick):293:main] <top>: Session config file not >> exists, using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-25 14:31:22.828418] I [resource(slave >> master/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-25 14:31:23.910206] I [resource(slave >> master/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >> volume duration=1.0813 >> >> [2018-09-25 14:31:23.913369] I [resource(slave >> master/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> >> >> Any ideas how to resolve this without re-creating everything again? Can I >> reset the changelog history? >> >> >> >> Regards, >> >> Christian >> >> >> >> *From: *<[email protected]> on behalf of "Kotte, >> Christian (Ext)" <[email protected]> >> *Date: *Monday, 24. September 2018 at 17:20 >> *To: *Kotresh Hiremath Ravishankar <[email protected]> >> *Cc: *Gluster Users <[email protected]> >> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log >> OSError: [Errno 13] Permission denied >> >> >> >> I don’t configure the permissions of /bricks/brick1/brick/.glusterfs. I >> don’t even see it on the local GlusterFS mount. >> >> >> >> Not sure why the permissions are configured with S and the AD group… >> >> >> >> Regards, >> >> Christian >> >> >> >> *From: *<[email protected]> on behalf of "Kotte, >> Christian (Ext)" <[email protected]> >> *Date: *Monday, 24. September 2018 at 17:10 >> *To: *Kotresh Hiremath Ravishankar <[email protected]> >> *Cc: *Gluster Users <[email protected]> >> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log >> OSError: [Errno 13] Permission denied >> >> >> >> Yeah right. I get permission denied. >> >> >> >> [geoaccount@slave ~]$ ll >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >> >> ls: cannot access >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e: >> Permission denied >> >> [geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/d1/ >> >> ls: cannot access /bricks/brick1/brick/.glusterfs/29/d1/: Permission >> denied >> >> [geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/29/ >> >> ls: cannot access /bricks/brick1/brick/.glusterfs/29/: Permission denied >> >> [geoaccount@slave ~]$ ll /bricks/brick1/brick/.glusterfs/ >> >> ls: cannot open directory /bricks/brick1/brick/.glusterfs/: Permission >> denied >> >> >> >> [root@slave ~]# ll /bricks/brick1/brick/.glusterfs/29 >> >> total 0 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 16 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 33 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 5e >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 73 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 b2 >> >> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 d1 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 d7 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 e6 >> >> drwx--S---+ 2 root AD+group 50 Sep 10 07:29 eb >> >> [root@slave ~]# >> >> >> >> However, the strange thing is that I could replicate new files and >> folders before. The replication is broken since the “New folder” was >> created. >> >> >> >> These are the permissions on a dev/test system: >> >> [root@slave-dev ~]# ll /bricks/brick1/brick/.glusterfs/ >> >> total 3136 >> >> drwx------. 44 root root 4096 Aug 22 18:19 00 >> >> drwx------. 50 root root 4096 Sep 12 13:14 01 >> >> drwx------. 54 root root 4096 Sep 13 11:33 02 >> >> drwx------. 59 root root 4096 Aug 22 18:21 03 >> >> drwx------. 60 root root 4096 Sep 12 13:14 04 >> >> drwx------. 68 root root 4096 Aug 24 12:36 05 >> >> drwx------. 56 root root 4096 Aug 22 18:21 06 >> >> drwx------. 46 root root 4096 Aug 22 18:21 07 >> >> drwx------. 51 root root 4096 Aug 22 18:21 08 >> >> drwx------. 42 root root 4096 Aug 22 18:21 09 >> >> drwx------. 44 root root 4096 Sep 13 11:16 0a >> >> >> >> I’ve configured an AD group, SGID bit, and ACLs via Ansible on the local >> mount point. Could this be an issue? Should I avoid configuring the >> permissions on .glusterfs and below? >> >> >> >> # ll /mnt/glustervol1/ >> >> total 12 >> >> drwxrwsr-x+ 4 AD+user AD+group 4096 Jul 13 07:46 Scripts >> >> drwxrwxr-x+ 10 AD+user AD+group 4096 Jun 12 12:03 Software >> >> -rw-rw-r--+ 1 root AD+group 0 Aug 8 08:44 test >> >> drwxr-xr-x+ 6 AD+user AD+group 4096 Apr 18 10:58 tftp >> >> >> >> glusterfs_volumes: >> >> […] >> >> permissions: >> >> mode: "02775" >> >> owner: root >> >> group: "AD+group" >> >> acl_permissions: rw >> >> […] >> >> >> >> # root directory is owned by root. >> >> # set permissions to 'g+s' to automatically set the group to "AD+group" >> >> # permissions of individual files will be set by Samba during creation >> >> - name: Configure volume directory permission 1/2 >> >> tags: glusterfs >> >> file: >> >> path: /mnt/{{ item.volume }} >> >> state: directory >> >> mode: "{{ item.permissions.mode }}" >> >> owner: "{{ item.permissions.owner }}" >> >> group: "{{ item.permissions.group }}" >> >> with_items: "{{ glusterfs_volumes }}" >> >> loop_control: >> >> label: "{{ item.volume }}" >> >> when: item.permissions is defined >> >> >> >> # ACL needs to be set to override default umask and grant "AD+group" >> write permissions >> >> - name: Configure volume directory permission 2/2 (ACL) >> >> tags: glusterfs >> >> acl: >> >> path: /mnt/{{ item.volume }} >> >> default: yes >> >> entity: "{{ item.permissions.group }}" >> >> etype: group >> >> permissions: "{{ item.permissions.acl_permissions }}" >> >> state: present >> >> with_items: "{{ glusterfs_volumes }}" >> >> loop_control: >> >> label: "{{ item.volume }}" >> >> when: item.permissions is defined >> >> >> >> Regards, >> >> Christian >> >> >> >> *From: *Kotresh Hiremath Ravishankar <[email protected]> >> *Date: *Monday, 24. September 2018 at 16:20 >> *To: *"Kotte, Christian (Ext)" <[email protected]> >> *Cc: *Gluster Users <[email protected]> >> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log >> OSError: [Errno 13] Permission denied >> >> >> >> I think I am get what's happening. The geo-rep session is non-root. >> >> Could you do readlink on brick path mentioned above >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >> >> from a geaccount user and see if you are getting "Permission Denied" >> errors? >> >> >> >> Thanks, >> >> Kotresh HR >> >> >> >> On Mon, Sep 24, 2018 at 7:35 PM Kotte, Christian (Ext) < >> [email protected]> wrote: >> >> Ok. It happens on all slave nodes (and on the interimmaster as well). >> >> >> >> It’s like I assumed. These are the logs of one of the slaves: >> >> >> >> gsyncd.log >> >> [2018-09-24 13:52:25.418382] I [repce(slave >> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-24 13:52:37.95297] W [gsyncd(slave >> slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, >> using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf >> >> [2018-09-24 13:52:37.109643] I [resource(slave >> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-24 13:52:38.303920] I [resource(slave >> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster >> volume duration=1.1941 >> >> [2018-09-24 13:52:38.304771] I [resource(slave >> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> [2018-09-24 13:52:41.981554] I [resource(slave >> slave/bricks/brick1/brick):598:entry_ops] <top>: Special case: rename on >> mkdir gfid=29d1d60d-1ad6-45fc-87e0-93d478f7331e >> entry='.gfid/6b97b987-8aef-46c3-af27-20d3aa883016/New folder' >> >> [2018-09-24 13:52:42.45641] E [repce(slave >> slave/bricks/brick1/brick):105:worker] <top>: call failed: >> >> Traceback (most recent call last): >> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 101, in >> worker >> >> res = getattr(self.obj, rmeth)(*in_data[2:]) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 599, >> in entry_ops >> >> src_entry = get_slv_dir_path(slv_host, slv_volume, gfid) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >> 682, in get_slv_dir_path >> >> [ENOENT], [ESTALE]) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/syncdutils.py", line >> 540, in errno_wrap >> >> return call(*arg) >> >> OSError: [Errno 13] Permission denied: >> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' >> >> [2018-09-24 13:52:42.81794] I [repce(slave >> slave/bricks/brick1/brick):80:service_loop] RepceServer: terminating on >> reaching EOF. >> >> [2018-09-24 13:52:53.459676] W [gsyncd(slave >> slave/bricks/brick1/brick):293:main] <top>: Session config file not exists, >> using the default config >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_glustervol1/gsyncd.conf >> >> [2018-09-24 13:52:53.473500] I [resource(slave >> slave/bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-24 13:52:54.659044] I [resource(slave >> slave/bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >> duration=1.1854 >> >> [2018-09-24 13:52:54.659837] I [resource(slave >> slave/bricks/brick1/brick):1146:service_loop] GLUSTER: slave listening >> >> >> >> The folder “New folder” will be created via Samba and it was renamed by >> my colleague right away after creation. >> >> [root@slave glustervol1_slave_glustervol1]# ls >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ >> >> [root@slave glustervol1_slave_glustervol1]# ls >> /bricks/brick1/brick/.glusterfs/29/d1/ -al >> >> total 0 >> >> drwx--S---+ 2 root AD+group 50 Sep 21 09:39 . >> >> drwx--S---+ 11 root AD+group 96 Sep 21 09:39 .. >> >> lrwxrwxrwx. 1 root AD+group 75 Sep 21 09:39 >> 29d1d60d-1ad6-45fc-87e0-93d478f7331e -> >> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation Manager >> >> >> >> Creating the folder in >> /bricks/brick1/brick/.glusterfs/6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/, >> but it didn’t change anything. >> >> >> >> mnt-slave-bricks-brick1-brick.log >> >> [2018-09-24 13:51:10.625723] W [rpc-clnt.c:1753:rpc_clnt_submit] >> 0-glustervol1-client-0: error returned while attempting to connect to >> host:(null), port:0 >> >> [2018-09-24 13:51:10.626092] W [rpc-clnt.c:1753:rpc_clnt_submit] >> 0-glustervol1-client-0: error returned while attempting to connect to >> host:(null), port:0 >> >> [2018-09-24 13:51:10.626181] I [rpc-clnt.c:2105:rpc_clnt_reconfig] >> 0-glustervol1-client-0: changing port to 49152 (from 0) >> >> [2018-09-24 13:51:10.643111] W [rpc-clnt.c:1753:rpc_clnt_submit] >> 0-glustervol1-client-0: error returned while attempting to connect to >> host:(null), port:0 >> >> [2018-09-24 13:51:10.643489] W [dict.c:923:str_to_data] >> (-->/usr/lib64/glusterfs/4.1.3/xlator/protocol/client.so(+0x4131a) >> [0x7fafb023831a] -->/lib64/libglusterfs.so.0(dict_set_str+0x16) >> [0x7fafbdb83266] -->/lib64/libglusterfs.so.0(str_to_data+0x91) >> [0x7fafbdb7fea1] ) 0-dict: value is NULL [Invalid argument] >> >> [2018-09-24 13:51:10.643507] I [MSGID: 114006] >> [client-handshake.c:1308:client_setvolume] 0-glustervol1-client-0: failed >> to set process-name in handshake msg >> >> [2018-09-24 13:51:10.643541] W [rpc-clnt.c:1753:rpc_clnt_submit] >> 0-glustervol1-client-0: error returned while attempting to connect to >> host:(null), port:0 >> >> [2018-09-24 13:51:10.671460] I [MSGID: 114046] >> [client-handshake.c:1176:client_setvolume_cbk] 0-glustervol1-client-0: >> Connected to glustervol1-client-0, attached to remote volume >> '/bricks/brick1/brick'. >> >> [2018-09-24 13:51:10.672694] I [fuse-bridge.c:4294:fuse_init] >> 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel >> 7.22 >> >> [2018-09-24 13:51:10.672715] I [fuse-bridge.c:4927:fuse_graph_sync] >> 0-fuse: switched to graph 0 >> >> [2018-09-24 13:51:10.673329] I [MSGID: 109005] >> [dht-selfheal.c:2342:dht_selfheal_directory] 0-glustervol1-dht: Directory >> selfheal failed: Unable to form layout for directory / >> >> [2018-09-24 13:51:16.116458] I [fuse-bridge.c:5199:fuse_thread_proc] >> 0-fuse: initating unmount of >> /var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E >> >> [2018-09-24 13:51:16.116595] W [glusterfsd.c:1514:cleanup_and_exit] >> (-->/lib64/libpthread.so.0(+0x7e25) [0x7fafbc9eee25] >> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x55d5dac5dd65] >> -->/usr/sbin/glusterfs(cleanup_and_exit+0x6b) [0x55d5dac5db8b] ) 0-: >> received signum (15), shutting down >> >> [2018-09-24 13:51:16.116616] I [fuse-bridge.c:5981:fini] 0-fuse: >> Unmounting '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. >> >> [2018-09-24 13:51:16.116625] I [fuse-bridge.c:5986:fini] 0-fuse: Closing >> fuse connection to '/var/mountbroker-root/user1300/mtpt-geoaccount-ARDW1E'. >> >> >> >> Regards, >> >> Christian >> >> >> >> *From: *Kotresh Hiremath Ravishankar <[email protected]> >> *Date: *Saturday, 22. September 2018 at 06:52 >> *To: *"Kotte, Christian (Ext)" <[email protected]> >> *Cc: *Gluster Users <[email protected]> >> *Subject: *Re: [Gluster-users] [geo-rep] Replication faulty - gsyncd.log >> OSError: [Errno 13] Permission denied >> >> >> >> The problem occured on slave side whose error is propagated to master. >> Mostly any traceback with repce involved is related to problem in slave. >> Just check few lines above in the log to find the slave node, the crashed >> worker is connected to and get geo replication logs to further debug. >> >> >> >> >> >> >> >> >> >> >> >> On Fri, 21 Sep 2018, 20:10 Kotte, Christian (Ext), < >> [email protected]> wrote: >> >> Hi, >> >> >> >> Any idea how to troubleshoot this? >> >> >> >> New folders and files were created on the master and the replication went >> faulty. They were created via Samba. >> >> >> >> Version: GlusterFS 4.1.3 >> >> >> >> [root@master]# gluster volume geo-replication status >> >> >> >> MASTER NODE MASTER VOL MASTER >> BRICK SLAVE USER >> SLAVE SLAVE >> NODE STATUS CRAWL STATUS LAST_SYNCED >> >> >> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> >> master glustervol1 /bricks/brick1/brick geoaccount >> ssh://geoaccount@slave_1::glustervol1 N/A Faulty >> N/A N/A >> >> master glustervol1 /bricks/brick1/brick geoaccount >> ssh://geoaccount@slave_2::glustervol1 N/A Faulty >> N/A N/A >> >> master glustervol1 /bricks/brick1/brick geoaccount >> ssh://geoaccount@interimmaster::glustervol1 N/A Faulty >> N/A N/A >> >> >> >> The following error is repeatedly logged in the gsyncd.logs: >> >> [2018-09-21 14:26:38.611479] I [repce(agent >> /bricks/brick1/brick):80:service_loop] RepceServer: terminating on reaching >> EOF. >> >> [2018-09-21 14:26:39.211527] I [monitor(monitor):279:monitor] Monitor: >> worker died in startup phase brick=/bricks/brick1/brick >> >> [2018-09-21 14:26:39.214322] I >> [gsyncdstatus(monitor):244:set_worker_status] GeorepStatus: Worker Status >> Change status=Faulty >> >> [2018-09-21 14:26:49.318953] I [monitor(monitor):158:monitor] Monitor: >> starting gsyncd worker brick=/bricks/brick1/brick slave_node= >> slave_3 <http://nrchbs-slp2020.nibr.novartis.net> >> >> [2018-09-21 14:26:49.471532] I [gsyncd(agent >> /bricks/brick1/brick):297:main] <top>: Using session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-21 14:26:49.473917] I [changelogagent(agent >> /bricks/brick1/brick):72:__init__] ChangelogAgent: Agent listining... >> >> [2018-09-21 14:26:49.491359] I [gsyncd(worker >> /bricks/brick1/brick):297:main] <top>: Using session config file >> path=/var/lib/glusterd/geo-replication/glustervol1_slave_3_glustervol1/gsyncd.conf >> >> [2018-09-21 14:26:49.538049] I [resource(worker >> /bricks/brick1/brick):1377:connect_remote] SSH: Initializing SSH connection >> between master and slave... >> >> [2018-09-21 14:26:53.5017] I [resource(worker >> /bricks/brick1/brick):1424:connect_remote] SSH: SSH connection between >> master and slave established. duration=3.4665 >> >> [2018-09-21 14:26:53.5419] I [resource(worker >> /bricks/brick1/brick):1096:connect] GLUSTER: Mounting gluster volume >> locally... >> >> [2018-09-21 14:26:54.120374] I [resource(worker >> /bricks/brick1/brick):1119:connect] GLUSTER: Mounted gluster volume >> duration=1.1146 >> >> [2018-09-21 14:26:54.121012] I [subcmds(worker >> /bricks/brick1/brick):70:subcmd_worker] <top>: Worker spawn successful. >> Acknowledging back to monitor >> >> [2018-09-21 14:26:56.144460] I [master(worker >> /bricks/brick1/brick):1593:register] _GMaster: Working dir >> path=/var/lib/misc/gluster/gsyncd/glustervol1_slave_3_glustervol1/bricks-brick1-brick >> >> [2018-09-21 14:26:56.145145] I [resource(worker >> /bricks/brick1/brick):1282:service_loop] GLUSTER: Register time >> time=1537540016 >> >> [2018-09-21 14:26:56.160064] I [gsyncdstatus(worker >> /bricks/brick1/brick):277:set_active] GeorepStatus: Worker Status Change >> status=Active >> >> [2018-09-21 14:26:56.161175] I [gsyncdstatus(worker >> /bricks/brick1/brick):249:set_worker_crawl_status] GeorepStatus: Crawl >> Status Change status=History Crawl >> >> [2018-09-21 14:26:56.161536] I [master(worker >> /bricks/brick1/brick):1507:crawl] _GMaster: starting history crawl >> turns=1 stime=(1537522637, 0) entry_stime=(1537537141, 0) >> etime=1537540016 >> >> [2018-09-21 14:26:56.164277] I [master(worker >> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time >> stime=(1537522637, 0) >> >> [2018-09-21 14:26:56.197065] I [master(worker >> /bricks/brick1/brick):1360:process] _GMaster: Skipping already processed >> entry ops to_changelog=1537522638 num_changelogs=1 >> from_changelog=1537522638 >> >> [2018-09-21 14:26:56.197402] I [master(worker >> /bricks/brick1/brick):1374:process] _GMaster: Entry Time Taken MKD=0 >> MKN=0 LIN=0 SYM=0 REN=0 RMD=0 CRE=0 duration=0.0000 UNL=1 >> >> [2018-09-21 14:26:56.197623] I [master(worker >> /bricks/brick1/brick):1384:process] _GMaster: Data/Metadata Time Taken >> SETA=0 SETX=0 meta_duration=0.0000 data_duration=0.0284 DATA=0 >> XATT=0 >> >> [2018-09-21 14:26:56.198230] I [master(worker >> /bricks/brick1/brick):1394:process] _GMaster: Batch Completed >> changelog_end=1537522638 entry_stime=(1537537141, 0) >> changelog_start=1537522638 stime=(1537522637, 0) duration=0.0333 >> num_changelogs=1 mode=history_changelog >> >> [2018-09-21 14:26:57.200436] I [master(worker >> /bricks/brick1/brick):1536:crawl] _GMaster: slave's time >> stime=(1537522637, 0) >> >> [2018-09-21 14:26:57.528625] E [repce(worker >> /bricks/brick1/brick):197:__call__] RepceClient: call failed >> call=17209:140650361157440:1537540017.21 method=entry_ops >> error=OSError >> >> [2018-09-21 14:26:57.529371] E [syncdutils(worker >> /bricks/brick1/brick):332:log_raise_exception] <top>: FAIL: >> >> Traceback (most recent call last): >> >> File "/usr/libexec/glusterfs/python/syncdaemon/gsyncd.py", line 311, in >> main >> >> func(args) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/subcmds.py", line 72, in >> subcmd_worker >> >> local.service_loop(remote) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/resource.py", line 1288, >> in service_loop >> >> g3.crawlwrap(oneshot=True) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 615, in >> crawlwrap >> >> self.crawl() >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1545, >> in crawl >> >> self.changelogs_batch_process(changes) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1445, >> in changelogs_batch_process >> >> self.process(batch) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1280, >> in process >> >> self.process_change(change, done, retry) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/master.py", line 1179, >> in process_change >> >> failures = self.slave.server.entry_ops(entries) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 216, in >> __call__ >> >> return self.ins(self.meth, *a) >> >> File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 198, in >> __call__ >> >> raise res >> >> OSError: [Errno 13] Permission denied: >> '/bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e' >> >> >> >> The permissions look fine. The replication is done via geo user instead >> of root. It should be able to read, but I’m not sure if the syncdaemon runs >> under geoaccount!? >> >> >> >> [root@master vRealize Operation Manager]# ll >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >> >> lrwxrwxrwx. 1 root root 75 Sep 21 09:39 >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e >> -> ../../6b/97/6b97b987-8aef-46c3-af27-20d3aa883016/vRealize Operation >> Manager >> >> >> >> [root@master vRealize Operation Manager]# ll >> /bricks/brick1/brick/.glusterfs/29/d1/29d1d60d-1ad6-45fc-87e0-93d478f7331e/ >> >> total 4 >> >> drwxrwxr-x. 2 AD+user AD+group 131 Sep 21 10:14 6.7 >> >> drwxrwxr-x. 2 AD+user AD+group 4096 Sep 21 09:43 7.0 >> >> drwxrwxr-x. 2 AD+user AD+group 57 Sep 21 10:28 7.5 >> >> [root@master vRealize Operation Manager]# >> >> >> >> It could be possible that the folder was renamed. I had 3 similar issues >> since I migrated to GlusterFS 4.x but couldn’t investigate much. I needed >> to completely wipe GlusterFS and geo-repliction to get rid of this error… >> >> >> >> Any help is appreciated. >> >> >> >> Regards, >> >> >> >> *Christian Kotte* >> >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> https://lists.gluster.org/mailman/listinfo/gluster-users >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gluster.org_mailman_listinfo_gluster-2Dusers&d=DwMFaQ&c=ZbgFmJjg4pdtrnL2HUJUDw&r=faVOd9yfnSYhe2mQhqlDwcpXGm7x8HN1C9wPmFD3694&m=buld78OSs9O-NEZ-w9vywUcr-bP6_RTbL2pwat-zRIU&s=bKc1d7zoIXuVSLbZS_vD3v4-FJrG2I6T6Dhcq8Qk6Bs&e=> >> >> >> >> -- >> >> Thanks and Regards, >> >> Kotresh H R >> _______________________________________________ >> Gluster-users mailing list >> [email protected] >> https://lists.gluster.org/mailman/listinfo/gluster-users > > _______________________________________________ > Gluster-users mailing list > [email protected] > https://lists.gluster.org/mailman/listinfo/gluster-users -- Thanks and Regards, Kotresh H R
_______________________________________________ Gluster-users mailing list [email protected] https://lists.gluster.org/mailman/listinfo/gluster-users
