On thing that’s conspicuously missing from this discussion is any historical 
context for how the version numbers are *supposed* to be handled. It seems like 
most of these problems are recent, or at least recent-ish.

IIUC the deal is (should be? used to be? Please correct!):

1) On initial creation, the log contains a version 0 no-op, making the db 
version 1.

2) On connection, the slave tells the master what version it has. If it doesn’t 
match what the master has then the master sends updates to bring them in sync.
2a) If the master’s change log is insufficient, (or the difference is “too 
big), then it sends the whole DB.
2b) If the difference is small enough, then the master just replays the change 
log from where the slave is.

3) Seems to me that the handling of the heartbeat messages ought to mirror the 
initial connection logic, or else make no attempt to do anything to the DB at 
all. Anything else is clearly risky and unnecessarily complex. (I never worried 
about them because I had already implemented external processes to deal with 
the issue. Somebody else should write this bullet.)

A new DB (on a slave) is guaranteed to have a smaller version number than the 
master (if the master is actually populated), so will always get a complete 

Truncation, preserving the version number is safe and periodically necessary. 

I do not remember the --reset option, but it’s clearly dangerous. How can it be 
used safely, knowing only the above?

(Where is Love when you need him?)

Personal email.  hbh...@oxy.edu

Reply via email to