Hi Kortesh Please find the logs of the above error *Master log snippet* > [2019-06-04 11:52:09.254731] I [resource(worker > /home/sas/gluster/data/code-misc):1379:connect_remote] SSH: Initializing > SSH connection between master and slave... > [2019-06-04 11:52:09.308923] D [repce(worker > /home/sas/gluster/data/code-misc):196:push] RepceClient: call > 89724:139652759443264:1559649129.31 __repce_version__() ... > [2019-06-04 11:52:09.602792] E [syncdutils(worker > /home/sas/gluster/data/code-misc):311:log_raise_exception] <top>: > connection to peer is broken > [2019-06-04 11:52:09.603312] E [syncdutils(worker > /home/sas/gluster/data/code-misc):805:errlog] Popen: command returned error > cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i > /var/lib/ glusterd/geo-replication/secret.pem -p 22 -oControlMaster=auto -S > /tmp/gsyncd-aux-ssh-4aL2tc/d893f66e0addc32f7d0080bb503f5185.sock > sas@192.168.185.107 /usr/libexec/glusterfs/gsyncd slave code-misc sas@ > 192.168.185.107::code-misc --master-node 192.168.185.106 --master-node-id > 851b64d0-d885-4ae9-9b38-ab5b15db0fec --master-brick > /home/sas/gluster/data/code-misc --local-node 192.168.185.122 --local-node- > id bcaa7af6-c3a1-4411-8e99-4ebecb32eb6a --slave-timeout 120 > --slave-log-level DEBUG --slave-gluster-log-level INFO > --slave-gluster-command-dir /usr/sbin error=1 > [2019-06-04 11:52:09.614996] I [repce(agent > /home/sas/gluster/data/code-misc):97:service_loop] RepceServer: terminating > on reaching EOF. > [2019-06-04 11:52:09.615545] D [monitor(monitor):271:monitor] Monitor: > worker(/home/sas/gluster/data/code-misc) connected > [2019-06-04 11:52:09.616528] I [monitor(monitor):278:monitor] Monitor: > worker died in startup phase brick=/home/sas/gluster/data/code-misc > [2019-06-04 11:52:09.619391] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty >
*Slave log snippet* > [2019-06-04 11:50:09.782668] E [syncdutils(slave > 192.168.185.106/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) > [2019-06-04 11:50:11.188167] W [gsyncd(slave > 192.168.185.125/home/sas/gluster/data/code-misc):305:main] <top>: Session > config file not exists, using the default config > path=/var/lib/glusterd/geo-replication/code-misc_192.168.185.107_code-misc/gsyncd.conf > [2019-06-04 11:50:11.201070] I [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1098:connect] GLUSTER: > Mounting gluster volume locally... > [2019-06-04 11:50:11.271231] E [resource(slave > 192.168.185.125/home/sas/gluster/data/code-misc):1006:handle_mounter] > MountbrokerMounter: glusterd answered mnt= > [2019-06-04 11:50:11.271998] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):805:errlog] Popen: > command returned error cmd=/usr/sbin/gluster --remote-host=localhost > system:: mount sas user-map-root=sas aux-gfid-mount acl log-level=INFO > log-file=/var/log/glusterfs/geo-replication-slaves/code-misc_192.168.185.107_code-misc/mnt-192.168.185.125-home-sas-gluster-data-code-misc.log > volfile-server=localhost volfile-id=code-misc client-pid=-1 error=1 > [2019-06-04 11:50:11.272113] E [syncdutils(slave > 192.168.185.125/home/sas/gluster/data/code-misc):809:logerr] Popen: > /usr/sbin/gluster> 2 : failed with this errno (No such file or directory) On Tue, Jun 4, 2019 at 5:10 PM deepu srinivasan <sdeep...@gmail.com> wrote: > Hi > As discussed I have upgraded gluster from 4.1 to 6.2 version. But the Geo > replication failed to start. > Stays in faulty state > > On Fri, May 31, 2019, 5:32 PM deepu srinivasan <sdeep...@gmail.com> wrote: > >> Checked the data. It remains in 2708. No progress. >> >> On Fri, May 31, 2019 at 4:36 PM Kotresh Hiremath Ravishankar < >> khire...@redhat.com> wrote: >> >>> That means it could be working and the defunct process might be some old >>> zombie one. Could you check, that data progress ? >>> >>> On Fri, May 31, 2019 at 4:29 PM deepu srinivasan <sdeep...@gmail.com> >>> wrote: >>> >>>> Hi >>>> When i change the rsync option the rsync process doesnt seem to start . >>>> Only a defunt process is listed in ps aux. Only when i set rsync option to >>>> " " and restart all the process the rsync process is listed in ps aux. >>>> >>>> >>>> On Fri, May 31, 2019 at 4:23 PM Kotresh Hiremath Ravishankar < >>>> khire...@redhat.com> wrote: >>>> >>>>> Yes, rsync config option should have fixed this issue. >>>>> >>>>> Could you share the output of the following? >>>>> >>>>> 1. gluster volume geo-replication <MASTERVOL> <SLAVEHOST>::<SLAVEVOL> >>>>> config rsync-options >>>>> 2. ps -ef | grep rsync >>>>> >>>>> On Fri, May 31, 2019 at 4:11 PM deepu srinivasan <sdeep...@gmail.com> >>>>> wrote: >>>>> >>>>>> Done. >>>>>> We got the following result . >>>>>> >>>>>>> 1559298781.338234 write(2, "rsync: link_stat >>>>>>> \"/tmp/gsyncd-aux-mount-EEJ_sY/.gfid/3fa6aed8-802e-4efe-9903-8bc171176d88\" >>>>>>> failed: No such file or directory (2)", 128 >>>>>> >>>>>> seems like a file is missing ? >>>>>> >>>>>> On Fri, May 31, 2019 at 3:25 PM Kotresh Hiremath Ravishankar < >>>>>> khire...@redhat.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> Could you take the strace with with more string size? The argument >>>>>>> strings are truncated. >>>>>>> >>>>>>> strace -s 500 -ttt -T -p <rsync pid> >>>>>>> >>>>>>> On Fri, May 31, 2019 at 3:17 PM deepu srinivasan <sdeep...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Kotresh >>>>>>>> The above-mentioned work around did not work properly. >>>>>>>> >>>>>>>> On Fri, May 31, 2019 at 3:16 PM deepu srinivasan < >>>>>>>> sdeep...@gmail.com> wrote: >>>>>>>> >>>>>>>>> Hi Kotresh >>>>>>>>> We have tried the above-mentioned rsync option and we are planning >>>>>>>>> to have the version upgrade to 6.0. >>>>>>>>> >>>>>>>>> On Fri, May 31, 2019 at 11:04 AM Kotresh Hiremath Ravishankar < >>>>>>>>> khire...@redhat.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> This looks like the hang because stderr buffer filled up with >>>>>>>>>> errors messages and no one reading it. >>>>>>>>>> I think this issue is fixed in latest releases. As a workaround, >>>>>>>>>> you can do following and check if it works. >>>>>>>>>> >>>>>>>>>> Prerequisite: >>>>>>>>>> rsync version should be > 3.1.0 >>>>>>>>>> >>>>>>>>>> Workaround: >>>>>>>>>> gluster volume geo-replication <MASTERVOL> >>>>>>>>>> <SLAVEHOST>::<SLAVEVOL> config rsync-options "--ignore-missing- >>>>>>>>>> args" >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Kotresh HR >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Thu, May 30, 2019 at 5:39 PM deepu srinivasan < >>>>>>>>>> sdeep...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi >>>>>>>>>>> We were evaluating Gluster geo Replication between two DCs one >>>>>>>>>>> is in US west and one is in US east. We took multiple trials for >>>>>>>>>>> different >>>>>>>>>>> file size. >>>>>>>>>>> The Geo Replication tends to stop replicating but while checking >>>>>>>>>>> the status it appears to be in Active state. But the slave volume >>>>>>>>>>> did not >>>>>>>>>>> increase in size. >>>>>>>>>>> So we have restarted the geo-replication session and checked the >>>>>>>>>>> status. The status was in an active state and it was in History >>>>>>>>>>> Crawl for a >>>>>>>>>>> long time. We have enabled the DEBUG mode in logging and checked >>>>>>>>>>> for any >>>>>>>>>>> error. >>>>>>>>>>> There was around 2000 file appeared for syncing candidate. The >>>>>>>>>>> Rsync process starts but the rsync did not happen in the slave >>>>>>>>>>> volume. >>>>>>>>>>> Every time the rsync process appears in the "ps auxxx" list but the >>>>>>>>>>> replication did not happen in the slave end. What would be the >>>>>>>>>>> cause of >>>>>>>>>>> this problem? Is there anyway to debug it? >>>>>>>>>>> >>>>>>>>>>> We have also checked the strace of the rync program. >>>>>>>>>>> it displays something like this >>>>>>>>>>> >>>>>>>>>>> "write(2, "rsync: link_stat \"/tmp/gsyncd-au"..., 128" >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> We are using the below specs >>>>>>>>>>> >>>>>>>>>>> Gluster version - 4.1.7 >>>>>>>>>>> Sync mode - rsync >>>>>>>>>>> Volume - 1x3 in each end (master and slave) >>>>>>>>>>> Intranet Bandwidth - 10 Gig >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Thanks and Regards, >>>>>>>>>> Kotresh H R >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Thanks and Regards, >>>>>>> Kotresh H R >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Thanks and Regards, >>>>> Kotresh H R >>>>> >>>> >>> >>> -- >>> Thanks and Regards, >>> Kotresh H R >>> >>
_______________________________________________ Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users