adriangb opened a new pull request, #22347: URL: https://github.com/apache/datafusion/pull/22347
## Which issue does this PR close? Relates to the discussion in #22024 about the Parquet datasource crate becoming hard to navigate. Split out of #22156, which bundled several code-motion moves into one PR — this is one of three smaller, independently-reviewable PRs that replace it. ## Rationale for this change `file_format.rs` had grown to ~2,000 LOC, bundling several distinct responsibilities into one file. That makes it hard to read and hard to review changes in isolation. This PR is **pure code motion**: no behavior change and no public API change. ## What changes are included in this PR? Extracts two responsibilities from `file_format.rs` into focused modules (`file_format.rs` drops to ~660 LOC): - `sink.rs` — `ParquetSink` and the parallel-write machinery (`column_serializer_task`, `spawn_column_parallel_row_group_writer`, `output_single_parquet_file_parallelized`, `concatenate_parallel_row_groups`, etc.). - `schema_coercion.rs` — the Arrow-schema coercion utilities (`apply_file_schema_type_coercions`, `coerce_int96_to_resolution`, `coerce_file_schema_to_view_type`, `coerce_file_schema_to_string_type`, `transform_schema_to_view`, `transform_binary_to_string`, `field_with_new_type`) and their tests. Every previously-public item is still reachable at the same path: the crate root re-exports `sink::ParquetSink` and the `schema_coercion::*` functions, and the historical `file_format::ParquetSink` path is preserved via `pub use` (datafusion-proto depends on it). ## Are these changes tested? Yes, covered by existing tests (the `coerce_int96_to_resolution_*` tests moved with the function to `schema_coercion.rs`). `cargo test -p datafusion-datasource-parquet --all-features` (122 passing) and `cargo clippy -p datafusion-datasource-parquet --all-targets --all-features -- -D warnings` both pass. `datafusion-proto` (a downstream `ParquetSink` consumer) builds clean. ## Are there any user-facing changes? No. Public API is unchanged — every previously-public item is still reachable at the same crate-root path. The only difference is the file organization inside the crate. 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
