hemantk-12 opened a new pull request, #5236: URL: https://github.com/apache/ozone/pull/5236
## What changes were proposed in this pull request? The problem is that [SSTFilteringService](https://github.com/apache/ozone/blob/master/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/SstFilteringService.java) and [SST pruning service](https://github.com/apache/ozone/blob/master/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L1483) work independently and try to optimize the space by deleting unnecessary SST files. SSTFilteringService deletes some files which don't belongs to the snapshotted bucket and SST prune service deletes the file which are not required for diff calculations. On the other hand compaction DAG is global at Ozone level and is kind a not aware of the above two clean ups. Problem arises when calculating the delta files for two snapshots and traversal reaches to this [condition](https://github.com/apache/ozone/blob/c801c02455982d3488cb099942f86912a492dc89/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/R ocksDBCheckpointDiffer.java#L1049). Graph traversal adds a node because it is not present in the toSnapshot (because it might be deleted by SSTFilteringService) and later gets added to diff file because of this [condition](https://github.com/apache/ozone/blob/c801c02455982d3488cb099942f86912a492dc89/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L1024). Before [returning delta files to SnapshotDiffManager](https://github.com/apache/ozone/blob/c801c02455982d3488cb099942f86912a492dc89/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L877), we look for the files in either [active DB dir and SST backup dir](https://github.com/apache/ozone/blob/c801c02455982d3488cb099942f86912a492dc89/hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java#L832). Active DB dir doesn't have these files because they were compacted and SST backup dir doesn't have because of SST pruning service. Detailed [explanation](https://issues.apache.org/jira/browse/HDDS-8940?focusedCommentId=17755663&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17755663) and [example](https://issues.apache.org/jira/browse/HDDS-8940?focusedCommentId=17755668&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17755668). In this PR, it is proposed to keep key range in the DAG node and use that to early return while traversing. A new DAO class CompactionLogEntry is added which persistent compaction files information to the compaction log. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-8940 ## How was this patch tested? * Existing unit tests. * New tests are in progress. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
