Hi,

I've been testing swap sync and conflict resolution for bucardo 4.8 and found 
that kids die with the following error message quite often during almost 
concurrent updates (i.e. by manually updating the same role on both source and 
target in order to simulate a conflict) 

[Thu Nov 15 15:26:05 2012]  KID No conflict, target only for 
public.products.prod_id: 10006
[Thu Nov 15 15:26:05 2012]  KID Action summary: 2:1
[Thu Nov 15 15:26:05 2012]  KID [1/1] public.products UPDATE target to source 
pk 10006
'Warning! Aborting due to exception for public.products.prod_id: 10006 Error 
was DBD::Pg::st execute failed: ERROR:  could not serialize access due to 
concurrent update at /usr/local/share/perl/5.10.1/Bucardo.pm line 5776.'
[Thu Nov 15 15:26:05 2012]  KID Final database backend PID is 27203
[Thu Nov 15 15:26:05 2012]  KID Kid exiting at cleanup_kid. Reason: Died at 
/usr/local/share/perl/5.10.1/Bucardo.pm line 5835.
[Thu Nov 15 15:26:05 2012]  KID Removed pid file 
"/var/run/bucardo/bucardo.kid.sync.dellstore2_swap.zen_dellstore2.pid"
[Thu Nov 15 15:26:14 2012]  CTL Rows updated child 27199 to aborted in q: 1
[Thu Nov 15 15:26:14 2012]  CTL Warning! Kid 27199 seems to have died. Sync 
"dellstore2_swap"
[Thu Nov 15 15:26:24 2012]  CTL Cleaning up aborted sync from q table for 
"zen_dellstore2". PID was 27199
[Thu Nov 15 15:26:24 2012]  CTL Already an empty slot, so not re-adding


After the sync is kicked, bucardo finds delta rows, detects a conflict due to 
updates for the same rows and successfully resolves it:

Thu Nov 15 15:31:21 2012]  KID Total delta count: 2
[Thu Nov 15 15:31:21 2012]  KID Logged details of conflict to 
bucardo_conflict.log
[Thu Nov 15 15:31:21 2012]  KID Conflict detected for public.products:10006. 
Using standard conflict "target"
[Thu Nov 15 15:31:21 2012]  KID Action summary: 2:1
[Thu Nov 15 15:31:21 2012]  KID [1/1] public.products UPDATE target to source 
pk 10006
[Thu Nov 15 15:31:21 2012]  KID Updating bucardo_track for public.products on 
blade_dellstore2
[Thu Nov 15 15:31:21 2012]  KID Updating bucardo_track for public.products on 
zen_dellstore2
[Thu Nov 15 15:31:21 2012]  KID Issuing final commit for source and target

The problem is that the kid is not restarted automatically. I'm not sure if it 
has something to do with the 'already an empty slot...' error message above. 
One workaround I found is to set sync's checktime to a non-zero value, so that 
pending delta rows are detected and replicated, but I wonder if it should 
restart the kid automatically after such failure, given keepalive flag is set 
for the sync?

Thank you,
--
Alexey Klyukin        http://www.commandprompt.com
The PostgreSQL Company – Command Prompt, Inc.




_______________________________________________
Bucardo-general mailing list
[email protected]
https://mail.endcrypt.com/mailman/listinfo/bucardo-general

Reply via email to