kosiew opened a new pull request, #1169:
URL: https://github.com/apache/datafusion-python/pull/1169
## Which issue does this PR close?
- Closes #1162
## Rationale for this change
This change enhances the flexibility of the Parquet writing process by
allowing users to specify both compression type and compression level through a
unified `ParquetWriterOptions` object. It also prevents conflicting
configurations when options are explicitly passed.
## What changes are included in this PR?
- Added `compression_level` parameter to `ParquetWriterOptions`.
- Enhanced `DataFrame.write_parquet()` to accept a `ParquetWriterOptions`
object.
- Added logic to prevent conflicting use of `compression_level` when using a
full options object.
- Introduced two new tests:
- `test_write_parquet_options`: Verifies functionality with custom
compression and level.
- `test_write_parquet_options_error`: Ensures proper error is raised for
misconfiguration.
## Are these changes tested?
Yes, two new tests have been added:
- `test_write_parquet_options`: Confirms Parquet output matches expected
data.
- `test_write_parquet_options_error`: Validates error handling when
conflicting options are provided.
## Are there any user-facing changes?
Yes:
- Users can now pass a `ParquetWriterOptions` object directly to
`DataFrame.write_parquet()`, allowing more granular control.
- A `ValueError` will be raised if `compression_level` is used with an
already configured options object.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]