schenksj opened a new pull request, #4532: URL: https://github.com/apache/datafusion-comet/pull/4532
## Which issue does this PR close? Closes #4527. ## Rationale for this change Spark wraps file-source partition columns and other per-batch constants in `ConstantColumnVector`. When such a batch reaches Comet's serialization path (`Utils.getBatchFieldVectors`, used by broadcast/shuffle) or the FFI export path (`NativeUtil.exportBatch`), it was rejected with: ``` Comet execution only takes Arrow Arrays, but got ...ConstantColumnVector ``` This is a standalone fix; it was surfaced while working on the Delta Lake contrib integration (the OPTIMIZE / deletion-vector rewrite paths pull constants through a Comet operator), so prioritizing it helps that effort, but it applies to any plan that routes a constant column through a Comet operator. ## What changes are included in this PR? - `ConstantColumnVectors.materialize` (in the `org.apache.spark.sql.comet.execution.arrow` package) builds a fresh Arrow `FieldVector` holding the constant repeated `numRows` times. It reuses the existing per-type `ArrowFieldWriter`s, so it covers every type -- scalars, decimal, timestamps, and complex struct/array/map -- and stays in sync with Spark's type handling, rather than a hand-rolled per-type switch. - `Utils.materializeConstantColumnVector` exposes it to the serialization path. - New match arms in `Utils.getBatchFieldVectors` and `NativeUtil.exportBatch` materialize a `ConstantColumnVector` instead of throwing. The existing `CometVector` path is untouched. ## How are these changes tested? New test in `UtilsSuite` round-trips a batch with a value `ConstantColumnVector` and a null `ConstantColumnVector` through `serializeBatches` / `decodeBatches` and asserts the materialized values (and nulls) survive. The test fails on `main` with the "only takes Arrow Arrays" exception and passes with this change. `UtilsSuite` (3/3) and `CometExecSuite` (126/0) pass. The FFI `exportBatch` arm shares the same `materializeConstantColumnVector` helper. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
