andygrove opened a new issue, #3180:
URL: https://github.com/apache/datafusion-comet/issues/3180

   ## Summary
   
   The `hour`, `minute`, and `second` expressions (and potentially other 
timestamp extraction functions) incorrectly apply timezone conversion when the 
input is a TimestampNTZ (timestamp without timezone) value.
   
   ## Root Cause
   
   In `native/spark-expr/src/datetime_funcs/extract_date_part.rs`, the 
implementation uses `array_with_timezone`:
   
   ```rust
   let array = array_with_timezone(
       array,
       self.timezone.clone(),
       Some(&DataType::Timestamp(
           Microsecond,
           Some(self.timezone.clone().into()),
       )),
   )?;
   ```
   
   For a TimestampNTZ input (`Timestamp(Microsecond, None)`), this calls 
`timestamp_ntz_to_timestamp` in `utils.rs`, which converts the local time value 
to UTC by applying the session timezone offset.
   
   ## Expected Behavior
   
   **For TimestampNTZ inputs:**
   - Values are stored as local time without any timezone information
   - `hour(timestamp_ntz_value)` should return the hour component directly from 
the local time value
   - No timezone conversion should be applied
   
   **For Timestamp (with timezone) inputs:**
   - Values are stored in UTC
   - `hour(timestamp_value)` should convert from UTC to the session timezone 
before extracting the hour
   - This is the current behavior and is correct
   
   ## Example
   
   ```sql
   -- With session timezone = America/Los_Angeles (UTC-8)
   -- TimestampNTZ value representing "2024-01-15 10:30:00" local time
   -- 
   -- Expected: hour() = 10 (extract directly from local time)
   -- Actual (bug): hour() = 18 (incorrectly converts assuming value is in UTC)
   ```
   
   ## Affected Functions
   
   - `hour` (SparkHour)
   - `minute` (SparkMinute)  
   - `second` (SparkSecond)
   
   All three use the same `extract_date_part!` macro with `array_with_timezone`.
   
   ## Impact
   
   This bug can cause incorrect results when using hour/minute/second functions 
on TimestampNTZ columns in non-UTC timezones. Currently the Scala serde layer 
does not validate input types, so TimestampNTZ inputs are not blocked.
   
   ## Suggested Fix
   
   Either:
   1. Add input type validation in the Scala serde layer (like 
`CometUnixTimestamp`) to mark TimestampNTZ as unsupported and fall back to Spark
   2. Fix the Rust implementation to handle TimestampNTZ correctly by not 
applying timezone conversion
   
   Option 1 is safer as a short-term fix.
   
   ## Related
   
   - Issue #378: Implement Spark-compatible CAST to/from TimestampNTZType
   - Issue #2733: [EPIC] Comet timezone handling
   - Issue #3179: String to TimestampNTZ cast bug
   
   ---
   
   > **Note:** This issue was generated with AI assistance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to