hangc0276 opened a new pull request, #3329: URL: https://github.com/apache/bookkeeper/pull/3329
### Motivation We found one place where the bookie may lose data even though we turn on fsync for the journal. Condition: - One journal disk, and turn on fsync for the journal - Configure two ledger disks, ledger1, and ledger2 Assume we write 100MB data into one bookie, 70MB data written into ledger1's write cache, and 30 MB data written into ledger2's write cache. Ledger1's write cache is full and triggers flush. In flushing the write cache, it will trigger a checkpoint to mark the journal’s lastMark position (100MB’s offset) and write the lastMark position into both ledger1 and ledger2's lastMark file. At this time, this bookie shutdown without flush write cache, such as shutdown by `kill -9` command, and ledger2's write cache (30MB) doesn’t flush into ledger disk. But ledger2's lastMark position which persisted into lastMark file has been updated to 100MB’s offset. When the bookie starts up, the journal reply position will be `min(ledger1's lastMark, ledger2's lastMark)`, and it will be 100MB’s offset. The ledger2's 30MB data won’t reply and that data will be lost. Discussion thread: https://lists.apache.org/thread/zz5vvv2yd80vqy22fv8wg5s2lqtkrzh9 ### Changes 1. Add checkpoint with specific ledgerDirManager support. When checkpoint triggered by specific ledgerDirManager, we only write `lastMark` into specific ledgerDirs 2. When bookie startup, read the minimal `lastMark` instead of the maximal `lastMark` as current last mark. 3. I will add a test soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
