Re: [PR] Detect the case to identify missing column from the file using file's max field id in StrictMetricsEvaluator #13397 [iceberg]

via GitHub Thu, 03 Jul 2025 11:17:46 -0700


manirajv06 commented on code in PR #13398:
URL: https://github.com/apache/iceberg/pull/13398#discussion_r2183470286



##########
api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java:
##########
@@ -69,13 +71,26 @@ public StrictMetricsEvaluator(Schema schema, Expression 
unbound, boolean caseSen
    *     otherwise.
    */
   public boolean eval(ContentFile<?> file) {
-    // TODO: detect the case where a column is missing from the file using 
file's max field id.
+    if (file.valueCounts() != null) {
+      int maxFieldId = file.valueCounts().keySet().stream().mapToInt(i -> 
i).max().orElse(0);

Review Comment:
   Schema might have 10 columns but the file could have 5 columns (say, field 
id from 1 to 5). So, we need to set 5 as max field id on that specific file. 
Then only, we can able to use this max field id to skip the files who's max 
field id is lesser than id of the column come through `Expression` to 
*MetricsEvaluator. Is my understanding correct?
   
   After writing the file using `DataWriter`, list of columns could be fetched 
from getFileMetaData().getSchema() and max value from the same can be set on 
`DataFile` or `ContentFile`. Which, in turn, could be used to skip the file as 
described earlier.
   
   Thoughts?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Detect the case to identify missing column from the file using file's max field id in StrictMetricsEvaluator #13397 [iceberg]

Reply via email to