Re: [PR] Detect the case to identify missing column from the file using file's max field id in StrictMetricsEvaluator #13397 [iceberg]

via GitHub Wed, 02 Jul 2025 12:13:49 -0700


Fokko commented on code in PR #13398:
URL: https://github.com/apache/iceberg/pull/13398#discussion_r2180800745



##########
api/src/main/java/org/apache/iceberg/expressions/StrictMetricsEvaluator.java:
##########
@@ -69,13 +71,26 @@ public StrictMetricsEvaluator(Schema schema, Expression 
unbound, boolean caseSen
    *     otherwise.
    */
   public boolean eval(ContentFile<?> file) {
-    // TODO: detect the case where a column is missing from the file using 
file's max field id.
+    if (file.valueCounts() != null) {
+      int maxFieldId = file.valueCounts().keySet().stream().mapToInt(i -> 
i).max().orElse(0);

Review Comment:
   Thanks for taking the suggestion into consideration. Since you already have 
the schema, you could also build a set of the field IDs to check if the column 
is missing.
   
   Keep in mind that not all the Parquet files have the field IDs set. If you 
convert an existing Hive table into an Iceberg table, the Iceberg will leverage 
[name-mapping](https://iceberg.apache.org/spec/#column-projection) to map the 
names into an Iceberg field-ID.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Detect the case to identify missing column from the file using file's max field id in StrictMetricsEvaluator #13397 [iceberg]

Reply via email to