swamirishi commented on code in PR #4633:
URL: https://github.com/apache/ozone/pull/4633#discussion_r1182845915


##########
hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdb/util/ManagedSstFileReader.java:
##########
@@ -49,16 +49,37 @@ public class ManagedSstFileReader {
 
   private final Collection<String> sstFiles;
 
+  private volatile long estimatedTotalKeys = -1;
+
   public ManagedSstFileReader(final Collection<String> sstFiles) {
     this.sstFiles = sstFiles;
   }
 
   public static <T> Stream<T> getStreamFromIterator(ClosableIterator<T> itr) {
     final Spliterator<T> spliterator =
         Spliterators.spliteratorUnknownSize(itr, 0);
-    return StreamSupport.stream(spliterator, false).onClose(() -> {
-      itr.close();
-    });
+    return StreamSupport.stream(spliterator, false).onClose(itr::close);
+  }
+
+  public long getEstimatedTotalKeys() throws RocksDBException {
+    if (estimatedTotalKeys != -1) {
+      return estimatedTotalKeys;
+    }
+
+    long estimatedSize = 0;
+    synchronized (this) {
+      try (ManagedOptions options = new ManagedOptions()) {
+        for (String sstFile : sstFiles) {
+          SstFileReader fileReader = new SstFileReader(options);
+          fileReader.open(sstFile);
+          estimatedSize += fileReader.getTableProperties().getNumEntries();

Review Comment:
   estimated Size here should be just getNumEntries() don't add the two. 
https://github.com/facebook/rocksdb/blob/main/include/rocksdb/table_properties.h#:~:text=//%20the%20number%20of,operands%20in%20the%20table



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -842,6 +853,20 @@ private Set<String> getDeltaFiles(OmSnapshot fromSnapshot,
     return deltaFiles;
   }
 
+  private void validateEstimatedKeyChangesAreInLimits(
+      ManagedSstFileReader sstFileReader
+  ) throws RocksDBException, IOException {
+    if (sstFileReader.getEstimatedTotalKeys() >
+        maxAllowedKeyChangesForASnapDiff) {
+      throw new IOException(

Review Comment:
   Nit: We can do this as part of a different PR, should we have a separate 
exception for Snapshots?



##########
hadoop-hdds/rocksdb-checkpoint-differ/src/main/java/org/apache/ozone/rocksdb/util/ManagedSstFileReader.java:
##########
@@ -49,16 +49,37 @@ public class ManagedSstFileReader {
 
   private final Collection<String> sstFiles;
 
+  private volatile long estimatedTotalKeys = -1;
+
   public ManagedSstFileReader(final Collection<String> sstFiles) {
     this.sstFiles = sstFiles;
   }
 
   public static <T> Stream<T> getStreamFromIterator(ClosableIterator<T> itr) {
     final Spliterator<T> spliterator =
         Spliterators.spliteratorUnknownSize(itr, 0);
-    return StreamSupport.stream(spliterator, false).onClose(() -> {
-      itr.close();
-    });
+    return StreamSupport.stream(spliterator, false).onClose(itr::close);
+  }
+
+  public long getEstimatedTotalKeys() throws RocksDBException {
+    if (estimatedTotalKeys != -1) {
+      return estimatedTotalKeys;
+    }
+
+    long estimatedSize = 0;
+    synchronized (this) {
+      try (ManagedOptions options = new ManagedOptions()) {
+        for (String sstFile : sstFiles) {
+          SstFileReader fileReader = new SstFileReader(options);
+          fileReader.open(sstFile);
+          estimatedSize += fileReader.getTableProperties().getNumEntries();
+          estimatedSize += fileReader.getTableProperties().getNumDeletions();
+        }
+      }
+    }
+
+    estimatedTotalKeys = estimatedSize;

Review Comment:
   Shouldn't we have this inside the synchronized block? And the synchronized 
block should also having the same if (estimatedTotalKeys != -1) {
         return estimatedTotalKeys;
       } to avoid redundant operations.



##########
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/snapshot/SnapshotDiffManager.java:
##########
@@ -155,9 +158,8 @@ public class SnapshotDiffManager implements AutoCloseable {
    */
   private final String sstBackupDirForSnapDiffJobs;
 
-  private boolean isNativeRocksToolsLoaded;
-
-  private ManagedSSTDumpTool sstDumpTool;
+  private final boolean isNativeRocksToolsLoaded;

Review Comment:
   Nit: In hindsight we could have just gotten away with just an Optional.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to