adriangb opened a new pull request, #22347:
URL: https://github.com/apache/datafusion/pull/22347

   ## Which issue does this PR close?
   
   Relates to the discussion in #22024 about the Parquet datasource crate 
becoming hard to navigate. Split out of #22156, which bundled several 
code-motion moves into one PR — this is one of three smaller, 
independently-reviewable PRs that replace it.
   
   ## Rationale for this change
   
   `file_format.rs` had grown to ~2,000 LOC, bundling several distinct 
responsibilities into one file. That makes it hard to read and hard to review 
changes in isolation. This PR is **pure code motion**: no behavior change and 
no public API change.
   
   ## What changes are included in this PR?
   
   Extracts two responsibilities from `file_format.rs` into focused modules 
(`file_format.rs` drops to ~660 LOC):
   
   - `sink.rs` — `ParquetSink` and the parallel-write machinery 
(`column_serializer_task`, `spawn_column_parallel_row_group_writer`, 
`output_single_parquet_file_parallelized`, `concatenate_parallel_row_groups`, 
etc.).
   - `schema_coercion.rs` — the Arrow-schema coercion utilities 
(`apply_file_schema_type_coercions`, `coerce_int96_to_resolution`, 
`coerce_file_schema_to_view_type`, `coerce_file_schema_to_string_type`, 
`transform_schema_to_view`, `transform_binary_to_string`, 
`field_with_new_type`) and their tests.
   
   Every previously-public item is still reachable at the same path: the crate 
root re-exports `sink::ParquetSink` and the `schema_coercion::*` functions, and 
the historical `file_format::ParquetSink` path is preserved via `pub use` 
(datafusion-proto depends on it).
   
   ## Are these changes tested?
   
   Yes, covered by existing tests (the `coerce_int96_to_resolution_*` tests 
moved with the function to `schema_coercion.rs`). `cargo test -p 
datafusion-datasource-parquet --all-features` (122 passing) and `cargo clippy 
-p datafusion-datasource-parquet --all-targets --all-features -- -D warnings` 
both pass. `datafusion-proto` (a downstream `ParquetSink` consumer) builds 
clean.
   
   ## Are there any user-facing changes?
   
   No. Public API is unchanged — every previously-public item is still 
reachable at the same crate-root path. The only difference is the file 
organization inside the crate.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to