On thing that’s conspicuously missing from this discussion is any historical
context for how the version numbers are *supposed* to be handled. It seems like
most of these problems are recent, or at least recent-ish.
IIUC the deal is (should be? used to be? Please correct!):
1) On initial creation, the log contains a version 0 no-op, making the db
2) On connection, the slave tells the master what version it has. If it doesn’t
match what the master has then the master sends updates to bring them in sync.
2a) If the master’s change log is insufficient, (or the difference is “too
big), then it sends the whole DB.
2b) If the difference is small enough, then the master just replays the change
log from where the slave is.
3) Seems to me that the handling of the heartbeat messages ought to mirror the
initial connection logic, or else make no attempt to do anything to the DB at
all. Anything else is clearly risky and unnecessarily complex. (I never worried
about them because I had already implemented external processes to deal with
the issue. Somebody else should write this bullet.)
A new DB (on a slave) is guaranteed to have a smaller version number than the
master (if the master is actually populated), so will always get a complete
Truncation, preserving the version number is safe and periodically necessary.
I do not remember the --reset option, but it’s clearly dangerous. How can it be
used safely, knowing only the above?
(Where is Love when you need him?)
Personal email. hbh...@oxy.edu