I have made changes which serve as a POC implementation to verify the improvements. PR: https://github.com/apache/iceberg/pull/15252 Core improvement / justification: Before schema evolution: 6 fields in schema – file 1 After schema evolution: 18 fields in schema When querying field 10 with isNull or notNaN:
* Existing behavior in StrictMetricsEvaluator: ROWS_MIGHT_NOT_MATCH * With maxFieldId: returns ROWS_MUST_MATCH For other operations on field 10: * Existing behavior: ROWS_MIGHT_NOT_MATCH * With maxFieldId: same result, but with early exit Similar behavior applies to InclusiveMetricsEvaluator. I understand this would require adding a new field to manifest files ( Iceberg specification change). I’d appreciate the community’s view on whether this improvement justifies that. If maxFieldId can instead be derived from the schema used to write the file, without adding it to DataFile, I would be happy to explore that direction.
