Smotrov opened a new pull request, #18954:
URL: https://github.com/apache/datafusion/pull/18954

   ## Which issue does this PR close?
   
   Closes #18947
   
   ## Rationale for this change
   
   Currently, DataFusion uses default compression levels when writing 
compressed JSON and CSV files. For ZSTD, this means level 3, which prioritizes 
speed over compression ratio. Users working with large datasets who want to 
optimize for storage costs or network transfer have no way to increase the 
compression level.
   
   This is particularly important for cloud data lake scenarios where storage 
and egress costs can be significant.
   
   ## What changes are included in this PR?
   
   - Add `compression_level: Option<u32>` field to `JsonOptions` and 
`CsvOptions` in `config.rs`
   - Add `convert_async_writer_with_level()` method to `FileCompressionType` 
(non-breaking API extension)
   - Keep original `convert_async_writer()` as a convenience wrapper for 
backward compatibility
   - Update `JsonWriterOptions` and `CsvWriterOptions` with `compression_level` 
field
   - Update `ObjectWriterBuilder` to support compression level
   - Update JSON and CSV sinks to pass compression level through the write 
pipeline
   - Update proto definitions and conversions for serialization support
   - Fix unrelated unused import warning in `udf.rs` (conditional compilation 
for debug-only imports)
   
   ## Are these changes tested?
   
   The changes follow the existing patterns used throughout the codebase. The 
implementation was verified by:
   - Building successfully with `cargo build`
   - Running existing tests with `cargo test --package datafusion-proto`
   - All 131 proto integration tests pass
   
   ## Are there any user-facing changes?
   
   Yes, users can now specify compression level when writing JSON/CSV files:
   
   ```rust
   use datafusion::common::config::JsonOptions;
   use datafusion::common::parsers::CompressionTypeVariant;
   
   let json_opts = JsonOptions {
       compression: CompressionTypeVariant::ZSTD,
       compression_level: Some(9),  // Higher compression
       ..Default::default()
   };
   ```
   
   **Supported compression levels:**
   - ZSTD: 1-22 (default: 3)
   - GZIP: 0-9 (default: 6)  
   - BZIP2: 1-9 (default: 9)
   - XZ: 0-9 (default: 6)
   
   **This is a non-breaking change** - the original `convert_async_writer()` 
method signature is
   ## Are these changes testedatibility.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to