Hi,

I'm new to this list so first I will say hello! I'm a new bucardo user and so far I'm enjoying using bucardo a lot. Thanks!

I have two issues and the debugging is rather long, so I'll keep them in seperate mails. Any help much appreciated.


I'm testing with 4.99.7 on ubuntu 13.04, Postgres 9.1.9

I have 4 source databases and one target, with a very simple test schema, and am testing the scenario when one db goes offline.

For now, I test this using:

 - sudo /etc/init.d/postgres stop # (kill the pg server)

Let's start with bucardo running:

PID of Bucardo MCP: 10652
 Name       State    Last good    Time    Last I/D    Last bad Time
==========+========+============+=======+===========+===========+=======
 testsync | Good   | 15:58:21   | 2s    | 0/4       | 15:49:22  | 9m 1s


And here's my process tree using ps auxf:

root 10652 0.0 0.1 146020 20856 ? S 15:49 0:00 Bucardo Master Control Program v4.99.7. Active syncs: testsync root 10662 0.0 0.1 147540 21936 ? S 15:49 0:00 \_ Bucardo VAC. root 10668 0.0 0.1 147564 22048 ? S 15:49 0:00 \_ Bucardo Controller. Sync "testsync" for relgroup "testherd" to dbs "testgroup" root 10683 0.0 0.1 149376 23688 ? S 15:49 0:00 \_ Bucardo Kid. Sync "testsync"


Now I stop one node, and insert a row on another node to intentionally kill he replication:


 Name       State    Last good    Time    Last I/D    Last bad Time
==========+========+============+=======+===========+===========+=======
 testsync | Bad    | 15:58:21   | 3m 6s | 0/4       | 16:01:25  | 2s


Now my process tree looks a bit funny:

root 10662 0.0 0.1 147540 21936 ? S 15:49 0:00 Bucardo VAC. root 10668 0.0 0.1 147564 22072 ? S 15:49 0:00 Bucardo Controller. Sync "testsync" for relgroup "testherd" to dbs "testgroup" root 10965 0.0 0.1 145260 20352 ? S 16:02 0:00 Bucardo Master Control Program v4.99.7.

In the log file I see it respawns the child process every 15 seconds, with a sensible error message:

(11061) [Wed May 8 16:03:48 2013] KID Kid 11061 exiting at cleanup_kid. Sync "testsync" Reason: DBI connect('dbname=testa;host=192.168.97.93','bucardo',...) failed: could not connect to server: Connection refused Is the server running on host "192.168.97.93" and accepting TCP/IP connections on port 5432? at /usr/local/share/perl/5.14.2/Bucardo.pm line 4941. Line: 2718

So far so good, so let's restart the offline node. Replication catches up nice and quickly.

But now I have a zombie VAC process?

 Name       State    Last good    Time    Last I/D    Last bad Time
==========+========+============+=======+===========+===========+========
 testsync | Good   | 16:04:54   | 13s   | 0/4       | 16:01:25  | 3m 42s

root 10662 0.0 0.1 147540 21936 ? S 15:49 0:00 Bucardo VAC. root 11148 0.1 0.1 146020 20852 ? S 16:05 0:00 Bucardo Master Control Program v4.99.7. Active syncs: testsync root 11159 0.0 0.1 147540 21952 ? S 16:05 0:00 \_ Bucardo VAC. root 11165 0.0 0.1 147564 22052 ? S 16:05 0:00 \_ Bucardo Controller. Sync "testsync" for relgroup "testherd" to dbs "testgroup" root 11180 0.0 0.1 149228 23276 ? S 16:05 0:00 \_ Bucardo Kid. Sync "testsync"


I'll now repeat the test, downing the node again:

 Name       State    Last good    Time    Last I/D    Last bad Time
==========+========+============+=======+===========+===========+=======
 testsync | Bad    | 16:07:19   | 26s   | 0/4       | 16:07:41  | 4s

root 10662 0.0 0.1 147540 21936 ? S 15:49 0:00 Bucardo VAC. root 11148 0.0 0.1 146020 21008 ? S 16:05 0:00 Bucardo Master Control Program v4.99.7. Active syncs: testsync root 11159 0.0 0.1 147540 21952 ? S 16:05 0:00 \_ Bucardo VAC. root 11165 0.0 0.1 147564 22076 ? S 16:05 0:00 \_ Bucardo Controller. Sync "testsync" for relgroup "testherd" to dbs "testgroup"


and restart:

 Name       State    Last good    Time    Last I/D    Last bad Time
==========+========+============+=======+===========+===========+========
 testsync | Good   | 16:08:58   | 8s    | 0/8       | 16:07:41  | 1m 25s

root 10662 0.0 0.1 147540 21936 ? S 15:49 0:00 Bucardo VAC. root 11159 0.0 0.1 147540 21952 ? S 16:05 0:00 Bucardo VAC. root 11303 0.4 0.1 146020 20852 ? S 16:09 0:00 Bucardo Master Control Program v4.99.7. Active syncs: testsync root 11318 0.1 0.1 147540 21940 ? S 16:09 0:00 \_ Bucardo VAC. root 11324 0.3 0.1 147564 22048 ? S 16:09 0:00 \_ Bucardo Controller. Sync "testsync" for relgroup "testherd" to dbs "testgroup" root 11342 0.6 0.1 149228 23276 ? S 16:09 0:00 \_ Bucardo Kid. Sync "testsync"




_______________________________________________
Bucardo-general mailing list
[email protected]
https://mail.endcrypt.com/mailman/listinfo/bucardo-general

Reply via email to