schenksj opened a new issue, #4527: URL: https://github.com/apache/datafusion-comet/issues/4527
### Problem Spark wraps file-source partition columns and other per-batch constants (partition values, synthetic columns) in `ConstantColumnVector`. When such a batch flows through a Comet operator into `NativeUtil.exportBatch` / `Utils.getBatchFieldVectors`, the export path only handles `CometVector` and throws: ``` org.apache.spark.SparkException: Comet execution only takes Arrow Arrays, but got ...ConstantColumnVector ``` Notably reproducible with `OPTIMIZE` on a table carrying constant/partition columns. ### Proposed fix Materialize the constant into a fresh Arrow `FieldVector` (the constant repeated `numRows` times) inline on the export path. Implement via the existing per-type `ArrowFieldWriter`s -- this covers scalars, decimal, timestamps, and complex struct/array/map, and stays in sync with Spark's type handling -- rather than a hand-rolled per-type switch: - `ConstantColumnVectors.materialize` in the `org.apache.spark.sql.comet.execution.arrow` package - exposed via `Utils.materializeConstantColumnVector` - with cases added to `NativeUtil.exportBatch` and `Utils.getBatchFieldVectors` ### Relationship to the Delta integration Standalone fix for any plan pushing constants through a Comet operator. It is **required for** the in-progress Delta Lake contrib integration (the OPTIMIZE / deletion-vector rewrite paths hit it), so it would help to prioritize it accordingly. A PR will follow shortly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
