Re: [Bucardo-general] Replication isn't working and status all gives a persistent error that doesn't match the state of the replicated databases

David Christensen Thu, 08 Feb 2018 08:20:01 -0800

> On Feb 8, 2018, at 9:35 AM, Jeff Silverman <[email protected]> wrote:
> 
> Hi, David, thanks for the reply. We were able to resolve this. Turns out the 
> error I posted was a red herring, and had no relevance. Which leads me to a 
> separate question, but I'll describe our resolution, first. I'll post the 
> details for closure's sake.
> 
> So, the problem turned out to be that there were tables that were renamed due 
> to our schema change process. But these changes were not accounted for in our 
> bucardo database, which led to an error. The real issue we struggled with was 
> opaqueness in the way bucardo reports errors.
> 
> The initial hints at this problem were found during the reload, but the 
> reload error didn't have any useful information in it.
> 
>     $ bucardo reload oltpdb_to_olapdw_sync
>     Reloading sync oltpdb_to_olapdw_sync...Reload of sync 
> oltpdb_to_olapdw_sync failed
> 
> bucardo status just said "Good" even though the "Last good" column was many 
> hours old at this point
> 
> Finally stumbled across the error by running `bucardo validate`
> 
> # bucardo validate all
> Validating sync oltpdb_to_olapdw_sync ... WARNING:  Issuing rollback() due to 
> DESTROY without explicit disconnect() of DBD::Pg::db handle 
> dbname=oltpdb;host=oltp01;sslmode=require at line 1018.
> CONTEXT:  PL/Perl function "validate_sync"
> ERROR:  Could not find "mid_transaction_types" inside the "dom_merchant" 
> schema on database "oltpdb"!   # <--- HERE; yes, this schema no longer exists 
> in this database
> CONTEXT:  PL/Perl function "validate_sync" at /usr/local/bin/bucardo line 
> 1266.
> 
> 
> So running `bucardo remove table <tablename>` for all the tables that had 
> been renamed in the master's schema, fixed the problem.
> 
> 
> Which leads to some questions:
> 1) Why is the error reporting so poor here? Is there any way this can be 
> improved?
>    - I tried using the '--verbose' flag when running bucardo commands but 
> that didn't add any extra information
>    - I looked at the bucardo log on disk but it didn't mention the underlying 
> issue


Yes, this could (should?) definitely be improved here; at the very least a 
suggestion to run “validate” on the sync if we get the “reload failed” message.

> 2) Is there any way to clear the error that persists every time I run 
> `bucardo status all`?
> The error that currently appears is still there, but has no current 
> relevance. That table is gone, and there's no row with that unique id 
> *anywhere* in our oltp database. Also, the error that occurred during 
> `bucardo validate` never appeared anywhere else, so we only figured that out 
> by exhausting all our possibilities.

`bucardo status` actually just returns the latest row from the `syncrun` table. 
 I’m not sure offhand if we can clear that through the program or not, but I 
agree that “last error” and “we have no errors” is an important distinction to 
make.

David
--
David Christensen
End Point Corporation
[email protected]
785-727-1171

signature.asc
Description: Message signed with OpenPGP

_______________________________________________
Bucardo-general mailing list
[email protected]
https://mail.endcrypt.com/mailman/listinfo/bucardo-general

Re: [Bucardo-general] Replication isn't working and status all gives a persistent error that doesn't match the state of the replicated databases

Reply via email to