Does "!completeList" do anything necessary in the line: if (!completeList && Math.abs(otherVersion) < ourLowThreshold) break;
I think the line should simply be: if (Math.abs(otherVersion) < ourLowThreshold) break; ----- The inclusion of "!completeList" in this conditional would seem to only cause some minor performance penalty: replaying a bunch of ADDs that the syncing replica already has ADDed. BUT: in our set-up this is causing a noticeable problem. In particular, we use a large value of nUpdates and we have an hourly DBQ for garbage collection. If we do rolling restarts of our replicas, then the second restart can leave us leaderless for a long span of time. This happens as follows: * Replica1 is leader. Replica1 goes down. * Leadership goes to Replica2. It resyncs with all replicas except Replica1. * Replica1 returns and resyncs. * Replica2 is leader. Replica2 goes down. * Leadership goes to Replica3. It resyncs with all replicas except Replica2. At this point, Replica1 has a longer updatelog (less trimmed -- more old updates) than the other replicas. We will refer to these as the "ancient" updates. Replica3 does a getVersion from Replica1 and Replica4 and receives replies from them. The ancient updates will not be contained in ourUpdateSet. While the ancient updates are older than ourLowThreshold, the check is skipped because of the "completeList" term that make no sense to me. So Replica3 replays the ancient ADDs. Say that 1000 of these ADDs are older than a DBQ in Replica3's update log? Then the DBQ gets replayed 1000 times ... once after each ADD is replayed. Fixing the replay mechanism to only replay the DBQ once looks hard because of the code structure. However, these ADDs (and hence the DBQ) shouldn't have even been replayed at all! After the leader Replica3 is synced. It asks Replica 1 and Replica4 to sync to it. The ancient ADDs have now been merged back unto Replica3's update log and so when Replica4 is syncing with Replica3, then Replica4 also ends up replaying the ancient ADDs and replaying the DBQ 1000 times. Only when all of this finally completes can Replica3 finally perform its role as leader and accept new updates. --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
