Hi David,Thanks for your response.After all it turns out that the Bucardo
server did not had enough memory. So when the memory was full the kid died.I
have upgraded that server from 2 to 12 Gb of ram and it seems that bucardo
keeps busy almost 5G.
Regards, Adrian Videanu
From: David Christensen <[email protected]>
To: Videanu Adrian <[email protected]>
Cc: "[email protected]" <[email protected]>
Sent: Wednesday, October 5, 2016 6:13 PM
Subject: Re: [Bucardo-general] Kid is not responding,
Hi Videanu,
> Hi all,
> I have a 4 Master bucardo 5.4.1 setup.
> The replication was down for a few days and now I have almost 8 millions rows
> to be moved between servers.
> Due to that the operation takes more than 1 hour. Until now I had a firewall
> problem and at almost 1, 1.5 hours the connections was cut and the
> transaction was restarted.
So did you fix the timeout issue via adjusting the tcp_keep_alives in your
postgresql.conf file? I’ve had to do that before with some long-running slony
operations where there were long periods of time where no data was being
transferred over the connections. That should keep the connection going even
if there were high waits in the transfer. (Though I’d be a little surprised if
there were pauses of that length without *any* data transfer.)
> Now I have fised that but I got this error:
> (2498) [Wed Oct 5 12:35:20 2016] CTL Warning: Kid 2525 is not responding,
> will respawn
> (2498) [Wed Oct 5 12:35:20 2016] CTL Old syncrun entry removed during
> resurrection, start time was 2016-10-05 11:12:45.165723+03
> (6411) [Wed Oct 5 12:35:20 2016] KID (ccAclSync) New kid, sync "ccAclSync"
> alive=1 Parent=2498 PID=6411 kicked=1
> (6411) [Wed Oct 5 12:35:20 2016] KID (ccAclSync) Overwriting
> /var/run/bucardo/bucardo.kid.sync.ccAclSync.pid: old process was ?
The messages you point out appear to be more informational than indicative of
ongoing error issues; this is the message you get if the Kid process no longer
exists. Now, if you are getting this message repeatedly and it’s never able to
have the Kid process run that’s a different story. That would indicate that
the Kid process is dying while trying to do the actual replication. My guess
right now is that it is a residue of the earlier issue you had.
> Is there any way that I could increase kid/sync timeout ? Maybe kick the sync
> manually with the timeout parameter ?
BTW, there is no timeout setting in Bucardo for the Kid sync. The answer here
is to figure out why the Kid is dying if it’s other than the timeout issue, and
fix that.
HTH,
David
--
David Christensen
End Point Corporation
[email protected]
785-727-1171
_______________________________________________
Bucardo-general mailing list
[email protected]
https://mail.endcrypt.com/mailman/listinfo/bucardo-general