schenksj opened a new issue, #4527:
URL: https://github.com/apache/datafusion-comet/issues/4527

   ### Problem
   
   Spark wraps file-source partition columns and other per-batch constants 
(partition values, synthetic columns) in `ConstantColumnVector`. When such a 
batch flows through a Comet operator into `NativeUtil.exportBatch` / 
`Utils.getBatchFieldVectors`, the export path only handles `CometVector` and 
throws:
   
   ```
   org.apache.spark.SparkException: Comet execution only takes Arrow Arrays, 
but got ...ConstantColumnVector
   ```
   
   Notably reproducible with `OPTIMIZE` on a table carrying constant/partition 
columns.
   
   ### Proposed fix
   
   Materialize the constant into a fresh Arrow `FieldVector` (the constant 
repeated `numRows` times) inline on the export path. Implement via the existing 
per-type `ArrowFieldWriter`s -- this covers scalars, decimal, timestamps, and 
complex struct/array/map, and stays in sync with Spark's type handling -- 
rather than a hand-rolled per-type switch:
   
   - `ConstantColumnVectors.materialize` in the 
`org.apache.spark.sql.comet.execution.arrow` package
   - exposed via `Utils.materializeConstantColumnVector`
   - with cases added to `NativeUtil.exportBatch` and 
`Utils.getBatchFieldVectors`
   
   ### Relationship to the Delta integration
   
   Standalone fix for any plan pushing constants through a Comet operator. It 
is **required for** the in-progress Delta Lake contrib integration (the 
OPTIMIZE / deletion-vector rewrite paths hit it), so it would help to 
prioritize it accordingly. A PR will follow shortly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to