jvrao commented on a change in pull request #927: BP-24: BookieScanner: Enhance Data Integrity URL: https://github.com/apache/bookkeeper/pull/927#discussion_r159816765
########## File path: site/bps/BP-24-BookieScanner.md ########## @@ -0,0 +1,92 @@ +?--- +title: "BP-24: BookieScanner: Enhance Data Integrity" +issue: https://github.com/apache/bookkeeper/<issue-number> +state: "Under Discussion" +release: "N/A" +--- + + +### Motivation + + +Currently Bookie can't deal entry losing gracefully, the AutoRecovery is restricted to the bookie level, which means the AutoRecovery takes effect only after bookie is down. However when a disk fails, either or both the ledger index files and entry log files could potentially become corrupt. BookKeeper needs to provide mechanisms to identify and handle these problems. + + +### Proposed Changes + + +We introduce Bookie Scanner, which is a background task, to scan index files and entry log files to detect possible corruptions. Since data corruption may happen at any time on any block on any Bookie, it is important to identify these errors in a timely manner. This way, the bookie can remove/compact corrupted entries and re-replicate entries from other replicas, to maintain data integrity and reduce client errors. + + +The Bookie Scanner needs to detect and cover following conditions: + + +- a ledger is missing local (no index file found for a given ledger), we can do this by looking into the ledger metadata. +- a ledger exists, but some entries are missing (no index entries found in the index file), we can check fragment?s metadata to verify this. +- a ledger exists, entries are found in index file, but the entries in entry log files are corrupted, we can use entry?s checksum to verify this. + + +A Bookie Scanner is integrated and run as part of compaction thread which already scans the entry log files. Review comment: High level is fine, but for the design, please have a google doc. @sijie we have decided to have google doc for bigger initiatives so everyone will get a chance to read/comment and iterate before we move forward with the code/pull request. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
