Adar Dembo created KUDU-1556:
--------------------------------

             Summary: Potential for data loss when recovering master from 
permanent failure
                 Key: KUDU-1556
                 URL: https://issues.apache.org/jira/browse/KUDU-1556
             Project: Kudu
          Issue Type: Bug
          Components: consensus, master
    Affects Versions: 0.10.0
            Reporter: Adar Dembo


Imagine the following scenario:
- Masters A, B, and C are alive and well. A is the leader.
- A replicates op 1.
- B persists and acknowledges op 1, as does A. Op 1 is now committed, even 
though C has yet to persist it.
- B dies, permanently.
- A new node is brought in to replace B. Not knowing any better, it copies the 
master tablet from C.
- The new node B starts up.
- A dies, permanently.

Now we have a problem: B and C constitute a healthy majority, but neither of 
them are aware of op 1. If node A is replaced in the same method, it too would 
not be aware of op 1, and despite the fact that op 1 was committed, it is now 
gone and whatever data it carried is lost.

Replacing nodes via Raft config change fixes this issue, but that's yet to be 
implemented for the master tablet. Some other potential fixes:
# Issue a NOOP quorum write before copying tablets. This ensures that all ops 
preceding the NOOP are present on both A and C, and B's replacement can copy 
either replica.
# Wait for A and C to converge before copying tablets by interrogating their 
last committed op IDs, remembering the highest one, and waiting for them to 
both commit that op ID. It's just like pushing a quorum write but done via 
passive observation instead.

After a discussion, we've concluded that we'll put off fixing this (incredibly 
rare) issue for now. Proper config change support for masters is on the 
horizon, and it'll obviate the need for a stopgap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to