andygrove opened a new issue, #3215:
URL: https://github.com/apache/datafusion-comet/issues/3215
## Description
When a native Comet operator receives data from a Spark scan that produces
`OnHeapColumnVector` instead of Arrow arrays, Comet fails with:
```
org.apache.spark.SparkException: Comet execution only takes Arrow Arrays,
but got class org.apache.spark.sql.execution.vectorized.OnHeapColumnVector
```
This can happen when:
1. The native scan (e.g., `native_comet`) doesn't support certain data types
(like complex types)
2. The scan falls back to Spark's Parquet reader
3. A downstream native operator (like the native Parquet writer) receives
the non-Arrow data
## Reproduction
```scala
// With native Parquet write enabled but without
COMET_SCAN_ALLOW_INCOMPATIBLE
withSQLConf(
"spark.comet.parquet.write.enabled" -> "true",
"spark.comet.exec.enabled" -> "true") {
// Create data with complex types
val df = Seq((1, Seq(1, 2, 3))).toDF("id", "values")
// Write to parquet (without Comet)
df.write.parquet("/tmp/input")
// Read and write - this fails because native_comet scan doesn't support
// complex types, falls back to Spark reader, but downstream native writer
// expects Arrow arrays
spark.read.parquet("/tmp/input").write.parquet("/tmp/output")
}
```
## Expected Behavior
Comet should either:
1. **Fall back the entire query to Spark** when native operators would
receive non-Arrow data
2. **Automatically insert conversion** from `OnHeapColumnVector` to Arrow
(using the existing `spark.comet.convert.parquet.enabled` mechanism)
3. **Provide a clearer error message** explaining why this happened and how
to fix it (e.g., "Enable spark.comet.scan.allowIncompatible to use
native_iceberg_compat scan which supports complex types")
## Current Workarounds
1. Enable `spark.comet.scan.allowIncompatible=true` so that
`native_iceberg_compat` scan is used (which supports complex types)
2. Enable `spark.comet.convert.parquet.enabled=true` to convert Spark
columnar data to Arrow
## Context
This was discovered while adding complex type support to the native Parquet
writer (#3214). The fix there uses `COMET_SCAN_ALLOW_INCOMPATIBLE`, but the
underlying issue of ungraceful failure should be addressed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]