boudica-dev-eng opened a new issue, #3255:
URL: https://github.com/apache/datafusion-comet/issues/3255
### Describe the bug
I am encountering a CometNativeException when performing standard date
transformations (to_date or datediff) on a Timestamp column read from an
Iceberg table.
The error message Cannot perform binary operation on arrays of different
length occurs even though the table schema contains no ArrayType columns (only
Scalars).
The issue appears to be related to how Comet handles the vectorisation of
the Timestamp column, possibly involving dictionary encoding in the underlying
Parquet files.
```
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in
stage X failed 4 times...
Caused by: org.apache.comet.CometNativeException: Compute error: Cannot
perform binary operation on arrays of different length
at org.apache.comet.Native.executePlan(Native Method)
...
```
When Comet is disabled, Spark executes the job flawlessly.
### Steps to reproduce
1. Read an Iceberg table containing a `TIMESTAMPTZ` column.
2. Apply `F.to_date()` to the timestamp column.
3. Trigger an action (e.g., `.count()` or a write).
```
# Schema is simple: id (String), ts (Timestamp) - No Arrays present
df = spark.read.format("iceberg").load("db.table")
# This crashes Comet:
df.withColumn("date_col", F.to_date(F.col("ts"))).count()
# This ALSO crashes Comet:
df.withColumn("diff", F.datediff(F.current_date(), F.col("ts"))).count()
```
### Expected behavior
Column is added as date
### Additional context
- Comet version: built from
https://github.com/apache/datafusion-comet/commit/ea26629049aa3e748e16eb6170793f2ecde045be
- Spark version: 4.0.1_2.13 (Spark Connect, Kubernetes, no Python/PySpark)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]