I've been mulling this as well.. Greg Sabino Mullane wrote: > On Wed, Dec 04, 2013 at 07:38:22AM -0500, Jonathan Brinkman wrote: > >> I agree this is a huge problem. I have a master with 20 slaves in a dbgroup, >> using bucardo 4.5, and if any one of the slaves goes offline then ALL syncs >> break, even those that have nothing to do with the missing slave or that >> dbgroup. This seriously undermines the value of bucardo... and it is very >> uncomfortable explaining to our clients why all the syncs are stopped until >> someone goes to fix the offline slave server. >> >> Is this fixed in bucardo 5? >> > > No, it is not fixed in B5. It's on the radar, but limited resources means I > will get to it as some point, but I don't know when that will be. Here's a > quick solution overview to kick around: > > * We already handle the cases where the ctl and kid lose connection to a > server - they just respawn > > * If the MCP loses connection to a server, the action depends on a few things: > > - If that server is used as a source, we deactivate all syncs using it as a > source. > - If the server is only used as a target, we drive on (if a new flag is set), > assuming there are other targets. If it is the only target, we deactivate the > sync. > - Periodically we try to reconnect to the downed servers. Once up, we > reactivate > the sync and/or add the server back as a target. >
Would it be possible to write the a delta table for each target host+table (or similar to the current but with an additional column of the target DB in the sync and therefore duplicating rows for each target - or a target+host table where there is a new table for each target - the latter of which would be useful for separating out master-master syncs.) Then instead of having kids and controllers for each target, the target could run it's own 'target daemon' that connects to the master and cleans up its own rows after it's been able to sync... If the target is part of a multimaster sync ie master-master-slave in the same sync the system would have to operate as it does now (so that the target daemon doesn't beat the sources after conflict resolution for the changed rows.) That way you could have a multi-master cluster operating as it is now and an additional sync that has one or more of the masters acting as master to multiple slaves - each independent so if the host dies it's sync also dies and the rows are left for either cleaning up later or synced when the host returns. Thoughts for most of the locking/deltas and also solving the master-master-slave issue is have all slaves with their own tables and multi-masters sync provide the deltas after each sync is complete... (ie by the sync process (master kid)) creating the deltas for the target instead of the triggers (so that conflict resolution and the final rows are replicated to all DBs before the target(slave) reads deltas and row data to replicate to itself. You could also probably include a flag on the DB table to say 'let the MCP take care of this sync' or 'let the remote DB pull the data' vacuuming of the delta table would be done by which ever process is performing the replication. Thoughts? Michelle -- Michelle Sullivan http://www.mhix.org/ _______________________________________________ Bucardo-general mailing list [email protected] https://mail.endcrypt.com/mailman/listinfo/bucardo-general
