schenksj opened a new pull request, #4532:
URL: https://github.com/apache/datafusion-comet/pull/4532

   ## Which issue does this PR close?
   
   Closes #4527.
   
   ## Rationale for this change
   
   Spark wraps file-source partition columns and other per-batch constants in 
`ConstantColumnVector`. When such a batch reaches Comet's serialization path 
(`Utils.getBatchFieldVectors`, used by broadcast/shuffle) or the FFI export 
path (`NativeUtil.exportBatch`), it was rejected with:
   
   ```
   Comet execution only takes Arrow Arrays, but got ...ConstantColumnVector
   ```
   
   This is a standalone fix; it was surfaced while working on the Delta Lake 
contrib integration (the OPTIMIZE / deletion-vector rewrite paths pull 
constants through a Comet operator), so prioritizing it helps that effort, but 
it applies to any plan that routes a constant column through a Comet operator.
   
   ## What changes are included in this PR?
   
   - `ConstantColumnVectors.materialize` (in the 
`org.apache.spark.sql.comet.execution.arrow` package) builds a fresh Arrow 
`FieldVector` holding the constant repeated `numRows` times. It reuses the 
existing per-type `ArrowFieldWriter`s, so it covers every type -- scalars, 
decimal, timestamps, and complex struct/array/map -- and stays in sync with 
Spark's type handling, rather than a hand-rolled per-type switch.
   - `Utils.materializeConstantColumnVector` exposes it to the serialization 
path.
   - New match arms in `Utils.getBatchFieldVectors` and 
`NativeUtil.exportBatch` materialize a `ConstantColumnVector` instead of 
throwing. The existing `CometVector` path is untouched.
   
   ## How are these changes tested?
   
   New test in `UtilsSuite` round-trips a batch with a value 
`ConstantColumnVector` and a null `ConstantColumnVector` through 
`serializeBatches` / `decodeBatches` and asserts the materialized values (and 
nulls) survive. The test fails on `main` with the "only takes Arrow Arrays" 
exception and passes with this change. `UtilsSuite` (3/3) and `CometExecSuite` 
(126/0) pass. The FFI `exportBatch` arm shares the same 
`materializeConstantColumnVector` helper.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to