Does "!completeList" do anything necessary in the line:

if (!completeList && Math.abs(otherVersion) < ourLowThreshold) break;

I think the line should simply be:

if (Math.abs(otherVersion) < ourLowThreshold) break;

-----
The inclusion of "!completeList" in this conditional would seem to
only cause some minor performance penalty: replaying a bunch of ADDs
that the syncing replica already has ADDed.

BUT: in our set-up this is causing a noticeable problem. In
particular, we use a large value of nUpdates and we have an hourly DBQ
for garbage collection. If we do rolling restarts of our replicas,
then the second restart can leave us leaderless for a long span of
time.

This happens as follows:
* Replica1 is leader. Replica1 goes down.
* Leadership goes to Replica2. It resyncs with all replicas except Replica1.
* Replica1 returns and resyncs.
* Replica2 is leader. Replica2 goes down.
* Leadership goes to Replica3. It resyncs with all replicas except Replica2.

At this point, Replica1 has a longer updatelog (less trimmed -- more
old updates) than the other replicas. We will refer to these as the
"ancient" updates.
Replica3 does a getVersion from Replica1 and Replica4 and receives
replies from them. The ancient updates will not be contained in
ourUpdateSet. While the ancient updates are older than
ourLowThreshold, the check is skipped because of the "completeList"
term that make no sense to me. So Replica3 replays the ancient ADDs.
Say that 1000 of these ADDs are older than a DBQ in Replica3's update
log? Then the DBQ gets replayed 1000 times ... once after each ADD is
replayed. Fixing the replay mechanism to only replay the DBQ once
looks hard because of the code structure. However, these ADDs (and
hence the DBQ) shouldn't have even been replayed at all!

After the leader Replica3 is synced. It asks Replica 1 and Replica4 to
sync to it. The ancient ADDs have now been merged back unto Replica3's
update log and so when Replica4 is syncing with Replica3, then
Replica4 also ends up replaying the ancient ADDs and replaying the DBQ
1000 times.

Only when all of this finally completes can Replica3 finally perform
its role as leader and accept new updates.

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to