[I] Comet should gracefully handle OnHeapColumnVector instead of failing [datafusion-comet]

via GitHub Sun, 18 Jan 2026 12:38:47 -0800


andygrove opened a new issue, #3215:
URL: https://github.com/apache/datafusion-comet/issues/3215


   ## Description
   
   When a native Comet operator receives data from a Spark scan that produces 
`OnHeapColumnVector` instead of Arrow arrays, Comet fails with:
   
   ```
   org.apache.spark.SparkException: Comet execution only takes Arrow Arrays, 
but got class org.apache.spark.sql.execution.vectorized.OnHeapColumnVector
   ```
   
   This can happen when:
   1. The native scan (e.g., `native_comet`) doesn't support certain data types 
(like complex types)
   2. The scan falls back to Spark's Parquet reader
   3. A downstream native operator (like the native Parquet writer) receives 
the non-Arrow data
   
   ## Reproduction
   
   ```scala
   // With native Parquet write enabled but without 
COMET_SCAN_ALLOW_INCOMPATIBLE
   withSQLConf(
     "spark.comet.parquet.write.enabled" -> "true",
     "spark.comet.exec.enabled" -> "true") {
     
     // Create data with complex types
     val df = Seq((1, Seq(1, 2, 3))).toDF("id", "values")
     
     // Write to parquet (without Comet)
     df.write.parquet("/tmp/input")
     
     // Read and write - this fails because native_comet scan doesn't support 
     // complex types, falls back to Spark reader, but downstream native writer
     // expects Arrow arrays
     spark.read.parquet("/tmp/input").write.parquet("/tmp/output")
   }
   ```
   
   ## Expected Behavior
   
   Comet should either:
   1. **Fall back the entire query to Spark** when native operators would 
receive non-Arrow data
   2. **Automatically insert conversion** from `OnHeapColumnVector` to Arrow 
(using the existing `spark.comet.convert.parquet.enabled` mechanism)
   3. **Provide a clearer error message** explaining why this happened and how 
to fix it (e.g., "Enable spark.comet.scan.allowIncompatible to use 
native_iceberg_compat scan which supports complex types")
   
   ## Current Workarounds
   
   1. Enable `spark.comet.scan.allowIncompatible=true` so that 
`native_iceberg_compat` scan is used (which supports complex types)
   2. Enable `spark.comet.convert.parquet.enabled=true` to convert Spark 
columnar data to Arrow
   
   ## Context
   
   This was discovered while adding complex type support to the native Parquet 
writer (#3214). The fix there uses `COMET_SCAN_ALLOW_INCOMPATIBLE`, but the 
underlying issue of ungraceful failure should be addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Comet should gracefully handle OnHeapColumnVector instead of failing [datafusion-comet]

Reply via email to