On thing that’s conspicuously missing from this discussion is any historical context for how the version numbers are *supposed* to be handled. It seems like most of these problems are recent, or at least recent-ish.
IIUC the deal is (should be? used to be? Please correct!): 1) On initial creation, the log contains a version 0 no-op, making the db version 1. 2) On connection, the slave tells the master what version it has. If it doesn’t match what the master has then the master sends updates to bring them in sync. 2a) If the master’s change log is insufficient, (or the difference is “too big), then it sends the whole DB. 2b) If the difference is small enough, then the master just replays the change log from where the slave is. 3) Seems to me that the handling of the heartbeat messages ought to mirror the initial connection logic, or else make no attempt to do anything to the DB at all. Anything else is clearly risky and unnecessarily complex. (I never worried about them because I had already implemented external processes to deal with the issue. Somebody else should write this bullet.) A new DB (on a slave) is guaranteed to have a smaller version number than the master (if the master is actually populated), so will always get a complete download. Truncation, preserving the version number is safe and periodically necessary. I do not remember the --reset option, but it’s clearly dangerous. How can it be used safely, knowing only the above? (Where is Love when you need him?) Personal email. [email protected]
