Hi Rob and Felix, Please share the *-changes.log files and brick logs, which will help in analysis of the issue.
Regards, Shwetha On Thu, Jun 25, 2020 at 1:26 PM Felix Kölzow <felix.koel...@gmx.de> wrote: > Hey Rob, > > > same issue for our third volume. Have a look at the logs just from right > now (below). > > Question: You removed the htime files and the old changelogs. Just rm the > files or is there something to pay more attention > > before removing the changelog files and the htime file. > > Regards, > > Felix > > [2020-06-25 07:51:53.795430] I [resource(worker > /gluster/vg00/dispersed_fuse1024/brick):1435:connect_remote] SSH: SSH > connection between master and slave established. duration=1.2341 > [2020-06-25 07:51:53.795639] I [resource(worker > /gluster/vg00/dispersed_fuse1024/brick):1105:connect] GLUSTER: Mounting > gluster volume locally... > [2020-06-25 07:51:54.520601] I [monitor(monitor):280:monitor] Monitor: > worker died in startup phase brick=/gluster/vg01/dispersed_fuse1024/brick > [2020-06-25 07:51:54.535809] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > [2020-06-25 07:51:54.882143] I [resource(worker > /gluster/vg00/dispersed_fuse1024/brick):1128:connect] GLUSTER: Mounted > gluster volume duration=1.0864 > [2020-06-25 07:51:54.882388] I [subcmds(worker > /gluster/vg00/dispersed_fuse1024/brick):84:subcmd_worker] <top>: Worker > spawn successful. Acknowledging back to monitor > [2020-06-25 07:51:56.911412] E [repce(agent > /gluster/vg00/dispersed_fuse1024/brick):121:worker] <top>: call failed: > Traceback (most recent call last): > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in > worker > res = getattr(self.obj, rmeth)(*in_data[2:]) > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line > 40, in register > return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries) > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 46, in cl_register > cls.raise_changelog_err() > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 30, in raise_changelog_err > raise ChangelogException(errn, os.strerror(errn)) > ChangelogException: [Errno 2] No such file or directory > [2020-06-25 07:51:56.912056] E [repce(worker > /gluster/vg00/dispersed_fuse1024/brick):213:__call__] RepceClient: call > failed call=75086:140098349655872:1593071514.91 method=register > error=ChangelogException > [2020-06-25 07:51:56.912396] E [resource(worker > /gluster/vg00/dispersed_fuse1024/brick):1286:service_loop] GLUSTER: > Changelog register failed error=[Errno 2] No such file or directory > [2020-06-25 07:51:56.928031] I [repce(agent > /gluster/vg00/dispersed_fuse1024/brick):96:service_loop] RepceServer: > terminating on reaching EOF. > [2020-06-25 07:51:57.886126] I [monitor(monitor):280:monitor] Monitor: > worker died in startup phase brick=/gluster/vg00/dispersed_fuse1024/brick > [2020-06-25 07:51:57.895920] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > [2020-06-25 07:51:58.607405] I [gsyncdstatus(worker > /gluster/vg00/dispersed_fuse1024/brick):287:set_passive] GeorepStatus: > Worker Status Change status=Passive > [2020-06-25 07:51:58.607768] I [gsyncdstatus(worker > /gluster/vg01/dispersed_fuse1024/brick):287:set_passive] GeorepStatus: > Worker Status Change status=Passive > [2020-06-25 07:51:58.608004] I [gsyncdstatus(worker > /gluster/vg00/dispersed_fuse1024/brick):281:set_active] GeorepStatus: > Worker Status Change status=Active > > > On 25/06/2020 09:15, rob.quaglio...@rabobank.com wrote: > > Hi All, > > > > We’ve got two six node RHEL 7.8 clusters and geo-replication would appear > to be completely broken between them. I’ve deleted the session, removed & > recreated pem files, old changlogs/htime (after removing relevant options > from volume) and completely set up geo-rep from scratch, but the new > session comes up as Initializing, then goes faulty, and starts looping. > Volume (on both sides) is a 4 x 2 disperse, running Gluster v6 (RH > latest). Gsyncd reports: > > > > [2020-06-25 07:07:14.701423] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Initializing... > > [2020-06-25 07:07:14.701744] I [monitor(monitor):159:monitor] Monitor: > starting gsyncd worker brick=/rhgs/brick20/brick slave_node= > bxts470194.eu.rabonet.com > > [2020-06-25 07:07:14.707997] D [monitor(monitor):230:monitor] Monitor: > Worker would mount volume privately > > [2020-06-25 07:07:14.757181] I [gsyncd(agent > /rhgs/brick20/brick):318:main] <top>: Using session config file > path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf > > [2020-06-25 07:07:14.758126] D [subcmds(agent > /rhgs/brick20/brick):107:subcmd_agent] <top>: RPC FD > rpc_fd='5,12,11,10' > > [2020-06-25 07:07:14.758627] I [changelogagent(agent > /rhgs/brick20/brick):72:__init__] ChangelogAgent: Agent listining... > > [2020-06-25 07:07:14.764234] I [gsyncd(worker > /rhgs/brick20/brick):318:main] <top>: Using session config file > path=/var/lib/glusterd/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/gsyncd.conf > > [2020-06-25 07:07:14.779409] I [resource(worker > /rhgs/brick20/brick):1386:connect_remote] SSH: Initializing SSH connection > between master and slave... > > [2020-06-25 07:07:14.841793] D [repce(worker > /rhgs/brick20/brick):195:push] RepceClient: call > 6799:140380783982400:1593068834.84 __repce_version__() ... > > [2020-06-25 07:07:16.148725] D [repce(worker > /rhgs/brick20/brick):215:__call__] RepceClient: call > 6799:140380783982400:1593068834.84 __repce_version__ -> 1.0 > > [2020-06-25 07:07:16.148911] D [repce(worker > /rhgs/brick20/brick):195:push] RepceClient: call > 6799:140380783982400:1593068836.15 version() ... > > [2020-06-25 07:07:16.149574] D [repce(worker > /rhgs/brick20/brick):215:__call__] RepceClient: call > 6799:140380783982400:1593068836.15 version -> 1.0 > > [2020-06-25 07:07:16.149735] D [repce(worker > /rhgs/brick20/brick):195:push] RepceClient: call > 6799:140380783982400:1593068836.15 pid() ... > > [2020-06-25 07:07:16.150588] D [repce(worker > /rhgs/brick20/brick):215:__call__] RepceClient: call > 6799:140380783982400:1593068836.15 pid -> 30703 > > [2020-06-25 07:07:16.150747] I [resource(worker > /rhgs/brick20/brick):1435:connect_remote] SSH: SSH connection between > master and slave established. duration=1.3712 > > [2020-06-25 07:07:16.150819] I [resource(worker > /rhgs/brick20/brick):1105:connect] GLUSTER: Mounting gluster volume > locally... > > [2020-06-25 07:07:16.265860] D [resource(worker > /rhgs/brick20/brick):879:inhibit] DirectMounter: auxiliary glusterfs mount > in place > > [2020-06-25 07:07:17.272511] D [resource(worker > /rhgs/brick20/brick):953:inhibit] DirectMounter: auxiliary glusterfs mount > prepared > > [2020-06-25 07:07:17.272708] I [resource(worker > /rhgs/brick20/brick):1128:connect] GLUSTER: Mounted gluster volume > duration=1.1218 > > [2020-06-25 07:07:17.272794] I [subcmds(worker > /rhgs/brick20/brick):84:subcmd_worker] <top>: Worker spawn successful. > Acknowledging back to monitor > > [2020-06-25 07:07:17.272973] D [master(worker > /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up change > detection mode mode=xsync > > [2020-06-25 07:07:17.273063] D [monitor(monitor):273:monitor] Monitor: > worker(/rhgs/brick20/brick) connected > > [2020-06-25 07:07:17.273678] D [master(worker > /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up change > detection mode mode=changelog > > [2020-06-25 07:07:17.274224] D [master(worker > /rhgs/brick20/brick):104:gmaster_builder] <top>: setting up change > detection mode mode=changeloghistory > > [2020-06-25 07:07:17.276484] D [repce(worker > /rhgs/brick20/brick):195:push] RepceClient: call > 6799:140380783982400:1593068837.28 version() ... > > [2020-06-25 07:07:17.276916] D [repce(worker > /rhgs/brick20/brick):215:__call__] RepceClient: call > 6799:140380783982400:1593068837.28 version -> 1.0 > > [2020-06-25 07:07:17.277009] D [master(worker > /rhgs/brick20/brick):777:setup_working_dir] _GMaster: changelog working dir > /var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick > > [2020-06-25 07:07:17.277098] D [repce(worker > /rhgs/brick20/brick):195:push] RepceClient: call > 6799:140380783982400:1593068837.28 init() ... > > [2020-06-25 07:07:17.292944] D [repce(worker > /rhgs/brick20/brick):215:__call__] RepceClient: call > 6799:140380783982400:1593068837.28 init -> None > > [2020-06-25 07:07:17.293097] D [repce(worker > /rhgs/brick20/brick):195:push] RepceClient: call > 6799:140380783982400:1593068837.29 register('/rhgs/brick20/brick', > '/var/lib/misc/gluster/gsyncd/prd_mx_intvol_bxts470190_prd_mx_intvol/rhgs-brick20-brick', > '/var/log/glusterfs/geo-replication/prd_mx_intvol_bxts470190_prd_mx_intvol/changes-rhgs-brick20-brick.log', > 8, 5) ... > > [2020-06-25 07:07:19.296294] E [repce(agent > /rhgs/brick20/brick):121:worker] <top>: call failed: > > Traceback (most recent call last): > > File "/usr/libexec/glusterfs/python/syncdaemon/repce.py", line 117, in > worker > > res = getattr(self.obj, rmeth)(*in_data[2:]) > > File "/usr/libexec/glusterfs/python/syncdaemon/changelogagent.py", line > 40, in register > > return Changes.cl_register(cl_brick, cl_dir, cl_log, cl_level, retries) > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 46, in cl_register > > cls.raise_changelog_err() > > File "/usr/libexec/glusterfs/python/syncdaemon/libgfchangelog.py", line > 30, in raise_changelog_err > > raise ChangelogException(errn, os.strerror(errn)) > > ChangelogException: [Errno 2] No such file or directory > > [2020-06-25 07:07:19.297161] E [repce(worker > /rhgs/brick20/brick):213:__call__] RepceClient: call failed > call=6799:140380783982400:1593068837.29 method=register > error=ChangelogException > > [2020-06-25 07:07:19.297338] E [resource(worker > /rhgs/brick20/brick):1286:service_loop] GLUSTER: Changelog register > failed error=[Errno 2] No such file or directory > > [2020-06-25 07:07:19.315074] I [repce(agent > /rhgs/brick20/brick):96:service_loop] RepceServer: terminating on reaching > EOF. > > [2020-06-25 07:07:20.275701] I [monitor(monitor):280:monitor] Monitor: > worker died in startup phase brick=/rhgs/brick20/brick > > [2020-06-25 07:07:20.277383] I > [gsyncdstatus(monitor):248:set_worker_status] GeorepStatus: Worker Status > Change status=Faulty > > > > We’ve done everything we can think of, including an “strace –f” on the > pid, and we can’t really find anything. I’m about to lose the last of my > hair over this, so does anyone have any ideas at all? We’ve even removed > the entire slave vol and rebuilt it. > > > > Thanks > > Rob > > > > *Rob Quagliozzi* > > *Specialised Application Support* > > > > > ------------------------------ > This email (including any attachments to it) is confidential, legally > privileged, subject to copyright and is sent for the personal attention of > the intended recipient only. If you have received this email in error, > please advise us immediately and delete it. You are notified that > disclosing, copying, distributing or taking any action in reliance on the > contents of this information is strictly prohibited. Although we have taken > reasonable precautions to ensure no viruses are present in this email, we > cannot accept responsibility for any loss or damage arising from the > viruses in this email or attachments. We exclude any liability for the > content of this email, or for the consequences of any actions taken on the > basis of the information provided in this email or its attachments, unless > that information is subsequently confirmed in writing. <#rbnl#1898i> > ------------------------------ > > > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing > listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users > > ________ > > > > Community Meeting Calendar: > > Schedule - > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC > Bridge: https://bluejeans.com/441850968 > > Gluster-users mailing list > Gluster-users@gluster.org > https://lists.gluster.org/mailman/listinfo/gluster-users >
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@gluster.org https://lists.gluster.org/mailman/listinfo/gluster-users