hemantk-12 opened a new pull request, #4045:
URL: https://github.com/apache/ozone/pull/4045

   ## What changes were proposed in this pull request?
   To generate faster diff between snapshots, we maintain a compaction DAG in 
memory. Whenever compaction happens, related SST file nodes get added to the 
DAG. Over time, DAG will keep on increasing and may cause memory pressure or 
become a bottleneck. To solve this, we can prune the unnecessary SST file nodes 
from the DAG since we have a concept of the oldest snapshot with compaction 
history.
   
   This change proposes the traversal and pruning of the DAG.
   Idea here is to first remove the nodes and arcs which were created before 
snapshot, to be deleted, was created because they are not needed to  generate 
the diff anymore.
   `pruneDownstreamDag` does that pruning and removes nodes and arcs from 
forward and backward DAGs by going over the successors of forward DAG's current 
level. Once older nodes and arcs get deleted from the oldest snapshot 
compaction history, remove the nodes and arcs which are not needed to generate 
diff for newer snapshots.
   `pruneUpstreamDag` does remaining pruning and removes nodes and arcs from 
both forward and backward DAGs by going over the successors from backward DAG 
of current level's node. If node in the current level doesn't have any 
successors in forward DAG, arc to the successor and current node can be deleted.
   
   Let's take an example of the following diagram (Backward DAG). 
   
   
![reverseGraph](https://user-images.githubusercontent.com/6820020/205718964-c92e71bb-edf9-4d11-8661-65973e1f76a7.png)
   
   Snapshots were taken at level1, level-3 and level-5
   Snapshot-1: 000015.sst, 000013.sst, 000011.sst, 000009.sst 
   Snapshot-2: 000027.sst, 000030.sst, 000028.sst, 000031.sst, 000029.sst, 
000039.sst, 000037.sst, 000035.sst, 000033.sst
   Snapshot-3: 000059.sst, 000055.sst, 000056.sst, 000060.sst, 000057.sst, 
000058.sst
   
   If Snapshot-1 and Snapshot-2 need to be pruned, we can simply prune 
downstream of level-3 and then upstream of level-3 in Forward DAG.
   
   1. `pruneDownstreamDag` will remove nodes of level-1 and level-2 and arcs 
between level-1 & level-2 and level-2 & level-3.
   2. `pruneUpstreamDag` will remove nodes from level-3 and arcs between 
level-3 & level-4.
   
   ## What is the link to the Apache JIRA
   * https://issues.apache.org/jira/browse/HDDS-7524
   
   ## How was this patch tested?
   * Unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to