Re: [PR] API, Core, Spark: Scan API for partition stats [iceberg]

via GitHub Mon, 10 Nov 2025 08:02:39 -0800


gaborkaszab commented on code in PR #14508:
URL: https://github.com/apache/iceberg/pull/14508#discussion_r2511131964



##########
core/src/main/java/org/apache/iceberg/PartitionStatsHandler.java:
##########
@@ -225,7 +223,7 @@ public static PartitionStatisticsFile 
computeAndWriteStatsFile(Table table, long
       }
 
       try {
-        stats = computeAndMergeStatsIncremental(table, snapshot, 
partitionType, statisticsFile);
+        stats = computeAndMergeStatsIncremental(table, snapshot, 
statisticsFile.snapshotId());

Review Comment:
   The reason this param is removed is that `computeAndMergeStatsIncremental` 
is changed to use the new Scan API instead of `readPartitionStatsFile`. Since 
the partitionType is deliberately made internal to the Scan, this is no longer 
needed in this function.
   What Do you mean we recalculate it later for every file? Whenever we 
calculate stats, we produce this partitionType once for scanning the latest 
existing partition stats in case it's an incremental computation, and then we 
produce it one more time when we sort the stats. I think this is cheap enough 
to worth keeping it internal to the Scan API and not giving a way for users to 
inject it into the scan in order to save one computation of this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] API, Core, Spark: Scan API for partition stats [iceberg]

Reply via email to