pos delete files

GitBox Wed, 21 Sep 2022 07:12:26 -0700


gaborkaszab commented on code in PR #5809:
URL: https://github.com/apache/iceberg/pull/5809#discussion_r976532071



##########
core/src/main/java/org/apache/iceberg/DeleteFileIndex.java:
##########
@@ -509,6 +509,20 @@ DeleteFileIndex build() {
         }
       }
 
+      scanMetrics.indexedDeleteFiles().increment(deleteEntries.size());
+      scanMetrics
+          .equalityDeleteFiles()
+          .increment(
+              deleteFilesByPartition.values().stream()
+                  .filter(e -> e.file().content() == 
FileContent.EQUALITY_DELETES)
+                  .count());
+      scanMetrics
+          .positionalDeleteFiles()
+          .increment(
+              deleteFilesByPartition.values().stream()
+                  .filter(e -> e.file().content() == 
FileContent.POSITION_DELETES)
+                  .count());

Review Comment:
   I think in case the number of delete files are high it would be good for 
performance considerations to reduce the number of iterations as much as 
possible even though this is O(N) complexity. I'm not sure how much it matters 
but here we have basically the same 2 iterations just to use a different 
filter. On the other hand I agree, code readability is much better this way.
   
   My proposal would be to increment these counters right where 
deleteFilesByPartition is populated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #5809: API,Core: Add scan planning metrics for indexed/eq/pos delete files

Reply via email to