adriangb opened a new pull request, #22398:
URL: https://github.com/apache/datafusion/pull/22398

   ## Which issue does this PR close?
   
   - Closes #.
   
   ## Rationale for this change
   
   `RowGroupsPrunedParquetOpen::build_stream` inlines the
   `build_projection_read_plan` + `reassign_expr_columns` + `make_projector` + 
`replace_schema` quartet right next to the decoder / stream wiring, which makes 
the opener's main orchestration body harder to follow and mixes two concerns: 
building the per-file projection vs. wiring it through the push-decoder stream.
   
   This PR isolates that block behind a small `DecoderProjection` type whose 
public surface is just "give me the projection mask" and "project this decoded 
batch onto the output schema."
   
   ## What changes are included in this PR?
   
   * New `decoder_projection` module with a `DecoderProjection` type:
     * `DecoderProjection::build(projection, physical_file_schema, 
parquet_schema, output_schema)` constructs the per-file projection in one call.
     * `projection_mask()` returns the mask installed on every decoder run.
     * `map(&batch)` applies the projector and, when needed, rebuilds the batch 
with `output_schema` to recover metadata / nullability that the file schema 
does not carry.
     * Fields are private.
   * `PushDecoderStreamState` collapses three fields (`projector`, 
`output_schema`, `replace_schema`) into a single `decoder_projection: 
DecoderProjection`. `project_batch` becomes a one-line delegate to 
`DecoderProjection::map`.
   * `replace_schema` is now derived from the projector's *output* schema 
(rather than the read plan's projected schema) so it stays correct under future 
widening of the decoder mask.
   * `DecoderBuilderConfig` carries the projection mask directly 
(`projection_mask: &ProjectionMask`) instead of the full `ParquetReadPlan`, 
since the read plan's `projected_schema` is no longer needed in this layer.
   
   No behaviour change.
   
   ## Are these changes tested?
   
   Covered by existing tests:
   
   * `cargo test -p datafusion-datasource-parquet` — 123 pass.
   * `cargo test -p datafusion --test parquet_integration` — 202 pass.
   * `cargo clippy -p datafusion-datasource-parquet --all-targets 
--all-features -- -D warnings` — clean.
   
   ## Are there any user-facing changes?
   
   No. All affected types are `pub(crate)`.
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to