andygrove opened a new pull request, #3878:
URL: https://github.com/apache/datafusion-comet/pull/3878
## Which issue does this PR close?
Closes #2720
## Rationale for this change
`CometSparkToColumnarExec` and `CometLocalTableScanExec` used
`conf.sessionLocalTimeZone` when creating Arrow schemas for timestamp columns.
However, the native side always deserializes `Timestamp` as
`Timestamp(Microsecond, Some("UTC"))` in `serde.rs`. When the session timezone
is non-UTC, this causes a timezone mismatch between the Arrow data schema and
the native plan schema.
The native `ScanExec.make_record_batch` detects this mismatch and casts the
column to match. Since Arrow timestamps with timezone are always stored as UTC
microseconds internally, this cast is a no-op on the data — but it adds
unnecessary overhead and is inconsistent with `NativeBatchReader` (the Parquet
scan path), which already hardcodes UTC.
## What changes are included in this PR?
- Changed `CometSparkToColumnarExec` and `CometLocalTableScanExec` to use
`"UTC"` instead of `conf.sessionLocalTimeZone` for Arrow schema timezone,
matching `NativeBatchReader` and native `serde.rs`
- Removed outdated "known issues with non-UTC timezones" warnings from
config descriptions for `spark.comet.convert.parquet.enabled`,
`spark.comet.convert.json.enabled`, `spark.comet.convert.csv.enabled`, and
`spark.comet.sparkToColumnar.enabled`
- Moved these configs from `testing` to `exec` category since the timezone
concern is resolved
## How are these changes tested?
Added two tests in `CometExecSuite`:
- `LocalTableScanExec with timestamps in non-UTC timezone` — verifies
correct results with `CometLocalTableScanExec` when session timezone is
`America/Los_Angeles`
- `sort on timestamps with non-UTC timezone via LocalTableScan` — verifies
correct results through a sort + repartition path with non-UTC timezone
All 90 tests in `CometExecSuite` pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]