[GitHub] [ozone] smengcl commented on a diff in pull request #3786: HDDS-7281. [Snapshot] Handle RocksDB compaction DAG persistence and reconstruction

GitBox Tue, 11 Oct 2022 19:57:26 -0700


smengcl commented on code in PR #3786:
URL: https://github.com/apache/ozone/pull/3786#discussion_r992935645



##########
hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdiff/RocksDBCheckpointDiffer.java:
##########
@@ -262,57 +336,34 @@ public void onCompactionCompleted(
                   e.printStackTrace();
                 }
               }
+              sb.append('\n');
+
               LOG.warn("List of output files:");
               for (String file : compactionJobInfo.outputFiles()) {
+                final String fn = file.substring(filenameBegin + 1);
+                sb.append(fn).append('\t');
                 LOG.warn(file + ",");
               }
-              // Let us also build the graph
-              for (String outFilePath : compactionJobInfo.outputFiles()) {
-                String outfile =
-                    Paths.get(outFilePath).getFileName().toString();
-                CompactionNode outfileNode = compactionNodeTable.get(outfile);
-                if (outfileNode == null) {
-                  long numKeys = 0;
-                  try {
-                    numKeys = getSSTFileSummary(outfile);
-                  } catch (Exception e) {
-                    LOG.warn(e.getMessage());
-                  }
-                  outfileNode = new CompactionNode(outfile,
-                      lastSnapshotPrefix, numKeys,
-                      currentCompactionGen);
-                  compactionDAGFwd.addNode(outfileNode);
-                  compactionDAGReverse.addNode(outfileNode);
-                  compactionNodeTable.put(outfile, outfileNode);
-                }
-                for (String inFilePath : compactionJobInfo.inputFiles()) {
-                  String infile =
-                      Paths.get(inFilePath).getFileName().toString();
-                  CompactionNode infileNode = compactionNodeTable.get(infile);
-                  if (infileNode == null) {
-                    long numKeys = 0;
-                    try {
-                      numKeys = getSSTFileSummary(infile);
-                    } catch (Exception e) {
-                      LOG.warn(e.getMessage());
-                    }
-                    infileNode = new CompactionNode(infile,
-                        lastSnapshotPrefix,
-                        numKeys, UNKNOWN_COMPACTION_GEN);
-                    compactionDAGFwd.addNode(infileNode);
-                    compactionDAGReverse.addNode(infileNode);
-                    compactionNodeTable.put(infile, infileNode);
-                  }
-                  if (outfileNode.fileName.compareToIgnoreCase(
-                      infileNode.fileName) != 0) {
-                    compactionDAGFwd.putEdge(outfileNode, infileNode);
-                    compactionDAGReverse.putEdge(infileNode, outfileNode);
-                  }
-                }
-              }
-              if (debugEnabled(DEBUG_DAG_BUILD_UP)) {
-                printMutableGraph(null, null, compactionDAGFwd);
+              sb.append('\n');
+
+              // Persist infile/outfile to file
+              try (BufferedWriter bw = Files.newBufferedWriter(
+                  Paths.get(CURRENT_COMPACTION_LOG_FILENAME),

Review Comment:
   Good question. We definitely thought about persisting the input/output files 
pairs in the RDB during discussions.
   
   A problem with RDB is by default the keys are sorted when flushed to SSTs. 
We can't effectively use RDB key for the input file pairs because the sorting 
messes up the order (although this shouldn't matter if we load the **entire** 
DAG into memory on OM startup, but that would leave no room for optimization - 
like loading only a chunk of the compaction log that is relevant to the 
snapshot SST diff calculation). There might be a way to turn off sorting but I 
believe that takes effect on the entire RDB instance not just one column family.
   
   We also didn't think it's worth it to start a new RDB instance just to 
persist the compaction log, unless there are other compelling reasons. Let me 
know if you think otherwise.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] smengcl commented on a diff in pull request #3786: HDDS-7281. [Snapshot] Handle RocksDB compaction DAG persistence and reconstruction

Reply via email to