smengcl opened a new pull request, #3786: URL: https://github.com/apache/ozone/pull/3786
This is a WIP. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-6860 ## What changes were proposed in this pull request? - Track RocksDB compaction with a custom `onCompactionCompleted` event listener that is attached on OM startup RDB open. - The `onCompactionCompleted` listener saves the copies of the compaction input files (by hardlinking them in a separate directory) that are about to be deleted by RDB compaction logic. - The `onCompactionCompleted` listener also writes the input files and output files pairs to a plain text file (compaction log) on each compaction. The compaction log is renamed to the checkpoint every time an Ozone snapshot is taken, i.e. when a - `RocksDBCheckpointDiffer#printSnapdiffSSTFiles` can now: 1. Locate the `from (src)` and `to (dest)` snapshot location in the chain; 2. Follow the chain and read each checkpoint's compaction log, load them to a DAG; 3. Returns the list of different SST files to be processed by RocksDiff. ### TODO - [ ] Refactor `RocksDBCheckpointDiffer` to reduce duplication between two `setRocksDBForCompactionTracking` methods (one for standalone testing and one for OM since they are using different RDB options right now) - [ ] Use https://github.com/apache/ozone/pull/3658 to locate the `from (src)` and `to (dest)` snapshots in the snapshot chain. It should be much more efficient to locate the snapshot in the chain this way. - [ ] Compaction logs would have to be merged the previous snapshot's during snapshot deletion. - [ ] Add checksum to the compaction log? - [ ] Optimization: Compaction logs should be persisted only after the **first** snapshot is taken (on a given bucket) - [ ] All TODOs added in this PR. ## How was this patch tested? - [x] Standalone UT `TestRocksDBCheckpointDiffer` (that tests with its own RDB) works as expected. - [ ] Need to work on the integration test `TestOMSnapshotDAG`. Currently `testZeroSizeKey` can trigger the listener. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
