varun-lakhyani commented on PR #15252: URL: https://github.com/apache/iceberg/pull/15252#issuecomment-3914675603
> @varun-lakhyani: Adding a new field to DataFile would require writing it to the manifest files. This constitutes a change to the Iceberg specification, so we would need a strong justification and a discussion on the dev mailing list. Got it, Already started dev ML thread and added kind of justification/Core improvements, would understand any views on this by community. Justification: Before schema evolution: 6 fields in schema – file 1 After schema evolution: 18 fields in schema When querying field 10 with isNull or notNaN: - Existing behavior in StrictMetricsEvaluator: ROWS_MIGHT_NOT_MATCH - With maxFieldId: returns ROWS_MUST_MATCH For other operations on field 10: - Existing behavior: ROWS_MIGHT_NOT_MATCH - With maxFieldId: same result, but with early exit Similar behavior applies to InclusiveMetricsEvaluator. If maxFieldId can instead be derived from the schema used to write the file, without adding it to DataFile, I would be happy to explore that direction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
