boudica-dev-eng opened a new issue, #3255:
URL: https://github.com/apache/datafusion-comet/issues/3255

   ### Describe the bug
   
   I am encountering a CometNativeException when performing standard date 
transformations (to_date or datediff) on a Timestamp column read from an 
Iceberg table.
   
   The error message Cannot perform binary operation on arrays of different 
length occurs even though the table schema contains no ArrayType columns (only 
Scalars).
   
   The issue appears to be related to how Comet handles the vectorisation of 
the Timestamp column, possibly involving dictionary encoding in the underlying 
Parquet files.
   
   ```
   org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage X failed 4 times...
   Caused by: org.apache.comet.CometNativeException: Compute error: Cannot 
perform binary operation on arrays of different length
       at org.apache.comet.Native.executePlan(Native Method)
       ...
   ```
   
   When Comet is disabled, Spark executes the job flawlessly.
   
   ### Steps to reproduce
   
   1. Read an Iceberg table containing a `TIMESTAMPTZ` column.
   2. Apply `F.to_date()` to the timestamp column.
   3. Trigger an action (e.g., `.count()` or a write).
   
   ```
   # Schema is simple: id (String), ts (Timestamp) - No Arrays present
   df = spark.read.format("iceberg").load("db.table")
   
   # This crashes Comet:
   df.withColumn("date_col", F.to_date(F.col("ts"))).count()
   
   # This ALSO crashes Comet:
   df.withColumn("diff", F.datediff(F.current_date(), F.col("ts"))).count()
   ```
   
   ### Expected behavior
   
   Column is added as date
   
   ### Additional context
   
   - Comet version: built from 
https://github.com/apache/datafusion-comet/commit/ea26629049aa3e748e16eb6170793f2ecde045be
   - Spark version: 4.0.1_2.13 (Spark Connect, Kubernetes, no Python/PySpark)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to