andygrove opened a new issue, #3180:
URL: https://github.com/apache/datafusion-comet/issues/3180
## Summary
The `hour`, `minute`, and `second` expressions (and potentially other
timestamp extraction functions) incorrectly apply timezone conversion when the
input is a TimestampNTZ (timestamp without timezone) value.
## Root Cause
In `native/spark-expr/src/datetime_funcs/extract_date_part.rs`, the
implementation uses `array_with_timezone`:
```rust
let array = array_with_timezone(
array,
self.timezone.clone(),
Some(&DataType::Timestamp(
Microsecond,
Some(self.timezone.clone().into()),
)),
)?;
```
For a TimestampNTZ input (`Timestamp(Microsecond, None)`), this calls
`timestamp_ntz_to_timestamp` in `utils.rs`, which converts the local time value
to UTC by applying the session timezone offset.
## Expected Behavior
**For TimestampNTZ inputs:**
- Values are stored as local time without any timezone information
- `hour(timestamp_ntz_value)` should return the hour component directly from
the local time value
- No timezone conversion should be applied
**For Timestamp (with timezone) inputs:**
- Values are stored in UTC
- `hour(timestamp_value)` should convert from UTC to the session timezone
before extracting the hour
- This is the current behavior and is correct
## Example
```sql
-- With session timezone = America/Los_Angeles (UTC-8)
-- TimestampNTZ value representing "2024-01-15 10:30:00" local time
--
-- Expected: hour() = 10 (extract directly from local time)
-- Actual (bug): hour() = 18 (incorrectly converts assuming value is in UTC)
```
## Affected Functions
- `hour` (SparkHour)
- `minute` (SparkMinute)
- `second` (SparkSecond)
All three use the same `extract_date_part!` macro with `array_with_timezone`.
## Impact
This bug can cause incorrect results when using hour/minute/second functions
on TimestampNTZ columns in non-UTC timezones. Currently the Scala serde layer
does not validate input types, so TimestampNTZ inputs are not blocked.
## Suggested Fix
Either:
1. Add input type validation in the Scala serde layer (like
`CometUnixTimestamp`) to mark TimestampNTZ as unsupported and fall back to Spark
2. Fix the Rust implementation to handle TimestampNTZ correctly by not
applying timezone conversion
Option 1 is safer as a short-term fix.
## Related
- Issue #378: Implement Spark-compatible CAST to/from TimestampNTZType
- Issue #2733: [EPIC] Comet timezone handling
- Issue #3179: String to TimestampNTZ cast bug
---
> **Note:** This issue was generated with AI assistance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]