nssalian opened a new pull request, #3381: URL: https://github.com/apache/iceberg-python/pull/3381
Continued work on https://github.com/apache/iceberg-python/issues/3100 ## PR Description Follow-up to #3119. Implements `ParquetFormatWriter` and `ParquetFormatModel`, registers Parquet in the `FileFormatFactory`, and rewrites `write_file` to dispatch through the factory using the `write.format.default` table property. Future formats can be added in a similar way. ## Rationale for this change The `write.format.default` table property was never read - the write path was hardcoded to Parquet. This PR makes the property functional. Also threads `file_format` through `_to_requested_schema` / `ArrowProjectionVisitor` / `_construct_field` so field ID metadata keys are correct per format (`PARQUET:field_id` for Parquet, `iceberg.id` plus `iceberg.required` for ORC), preparing the write path for ORC support without changing default behavior. ## Are these changes tested? - `tests/io/test_format_writers.py` adds parametrized tests modeled after Java's `BaseFormatModelTests` covering round-trip, statistics, null handling, context manager caching, close idempotency, close-without-write, and ORC vs Parquet field ID dispatch. - `tests/io/test_pyarrow.py` adds `test_write_file_parquet_round_trip` and `test_write_file_dispatches_on_write_format_default` exercising the full `write_file` path. ## Are there any user-facing changes? No. Default behavior is unchanged. Setting `write.format.default` to an unregistered format now raises a `ValueError`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
