Adar Dembo created KUDU-1556:
--------------------------------
Summary: Potential for data loss when recovering master from
permanent failure
Key: KUDU-1556
URL: https://issues.apache.org/jira/browse/KUDU-1556
Project: Kudu
Issue Type: Bug
Components: consensus, master
Affects Versions: 0.10.0
Reporter: Adar Dembo
Imagine the following scenario:
- Masters A, B, and C are alive and well. A is the leader.
- A replicates op 1.
- B persists and acknowledges op 1, as does A. Op 1 is now committed, even
though C has yet to persist it.
- B dies, permanently.
- A new node is brought in to replace B. Not knowing any better, it copies the
master tablet from C.
- The new node B starts up.
- A dies, permanently.
Now we have a problem: B and C constitute a healthy majority, but neither of
them are aware of op 1. If node A is replaced in the same method, it too would
not be aware of op 1, and despite the fact that op 1 was committed, it is now
gone and whatever data it carried is lost.
Replacing nodes via Raft config change fixes this issue, but that's yet to be
implemented for the master tablet. Some other potential fixes:
# Issue a NOOP quorum write before copying tablets. This ensures that all ops
preceding the NOOP are present on both A and C, and B's replacement can copy
either replica.
# Wait for A and C to converge before copying tablets by interrogating their
last committed op IDs, remembering the highest one, and waiting for them to
both commit that op ID. It's just like pushing a quorum write but done via
passive observation instead.
After a discussion, we've concluded that we'll put off fixing this (incredibly
rare) issue for now. Proper config change support for masters is on the
horizon, and it'll obviate the need for a stopgap.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)