Re: [PR] Detect the case to identify missing column from the file using file's max field id in StrictMetricsEvaluator #13397 [iceberg]

via GitHub Wed, 02 Jul 2025 10:20:02 -0700


manirajv06 commented on code in PR #13398:
URL: https://github.com/apache/iceberg/pull/13398#discussion_r2180593177



##########
api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java:
##########
@@ -69,13 +71,26 @@ public StrictMetricsEvaluator(Schema schema, Expression 
unbound, boolean caseSen
    *     otherwise.
    */
   public boolean eval(ContentFile<?> file) {
-    // TODO: detect the case where a column is missing from the file using 
file's max field id.
+    if (file.valueCounts() != null) {
+      int maxFieldId = file.valueCounts().keySet().stream().mapToInt(i -> 
i).max().orElse(0);

Review Comment:
   @amogh-jahagirdar @Fokko  On navigating through the code further, it goes 
all the way to ParquetReader.java (for Parquet data files). I can see that 
fields can be extracted using reader.getFileMetaData().getSchema().getFields() 
and max can be derived from there. Finally, this derived max value has to be 
set on content file/data file. Similarly, we can do for other file types. 
Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Detect the case to identify missing column from the file using file's max field id in StrictMetricsEvaluator #13397 [iceberg]

Reply via email to