rdblue commented on code in PR #5268:
URL: https://github.com/apache/iceberg/pull/5268#discussion_r933636831


##########
core/src/main/java/org/apache/iceberg/ManifestGroup.java:
##########
@@ -184,12 +192,14 @@ public <T extends ScanTask> CloseableIterable<T> 
plan(CreateTasksFunction<T> cre
                 specId -> {
                   PartitionSpec spec = specsById.get(specId);
                   ResidualEvaluator residuals = residualCache.get(specId);
-                  return new TaskContext(spec, deleteFiles, residuals, 
dropStats);
+                  return new TaskContext(spec, deleteFiles, residuals, 
dropStats, scanMetrics);
                 });
 
     Iterable<CloseableIterable<T>> tasks =
         entries(
             (manifest, entries) -> {
+              
scanMetrics.totalDataFiles().increment(manifest.addedFilesCount());
+              
scanMetrics.totalDeleteFiles().increment(manifest.deletedFilesCount());

Review Comment:
   Delete files and deleted files are distinct. This is the number of deleted 
entries in the manifest, not the number of delete files.
   
   In addition, existing data files will be used in the scan, not just added 
data files. Since this is counting inside of `ManifestGroup`, we need to make 
sure that all of the returned files are accounted for. There are two options:
   1. Put these updates in `entries` where `ignoreDeleted` and `ignoreExisting` 
are checked
   2. Add the total data files counter to the manifest reader, like you 
currently have for skipped files
   
   I prefer option 2. I would opt to have simpler hooks instead of trying to 
count in bulk like this using metadata counts at the right place. That's 
probably going to be the most reliable way to get this data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to