Re: [PR] API: Improve StrictMetricsEvaluator handling of missing columns using maxFieldId [iceberg]

via GitHub Tue, 17 Feb 2026 05:21:50 -0800


varun-lakhyani commented on PR #15252:
URL: https://github.com/apache/iceberg/pull/15252#issuecomment-3914675603


   > @varun-lakhyani: Adding a new field to DataFile would require writing it 
to the manifest files. This constitutes a change to the Iceberg specification, 
so we would need a strong justification and a discussion on the dev mailing 
list.
   
   Got it, Already started dev ML thread and added kind of justification/Core 
improvements, would understand any views on this by community.
   
   Justification:
   Before schema evolution: 6 fields in schema – file 1
   After schema evolution: 18 fields in schema
   
   When querying field 10 with isNull or notNaN:
   - Existing behavior in StrictMetricsEvaluator: ROWS_MIGHT_NOT_MATCH
   - With maxFieldId: returns ROWS_MUST_MATCH
   
   For other operations on field 10:
   - Existing behavior: ROWS_MIGHT_NOT_MATCH
   - With maxFieldId: same result, but with early exit
   
   Similar behavior applies to InclusiveMetricsEvaluator.
   
   If maxFieldId can instead be derived from the schema used to write the file, 
without adding it to DataFile, I would be happy to explore that direction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] API: Improve StrictMetricsEvaluator handling of missing columns using maxFieldId [iceberg]

Reply via email to