andygrove opened a new issue, #3112:
URL: https://github.com/apache/datafusion-comet/issues/3112

   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `subtract_timestamps` function, 
causing queries using this function to fall back to Spark's JVM execution 
instead of running natively on DataFusion.
   
   The `SubtractTimestamps` expression computes the difference between two 
timestamp values. Depending on configuration, it returns either a legacy 
`CalendarInterval` (with microseconds field only) or a `DayTimeInterval` 
representing the time difference in microseconds.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   timestamp_expr1 - timestamp_expr2
   ```
   
   ```scala
   // DataFrame API
   col("end_timestamp") - col("start_timestamp")
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | left | Expression | The end timestamp (minuend) |
   | right | Expression | The start timestamp (subtrahend) |
   | legacyInterval | Boolean | Whether to return CalendarInterval (true) or 
DayTimeInterval (false) |
   | timeZoneId | Option[String] | Optional timezone identifier for timestamp 
interpretation |
   
   **Return Type:** - `CalendarInterval` when `legacyInterval` is true 
(controlled by `spark.sql.legacy.interval.enabled`)
   - `DayTimeIntervalType` when `legacyInterval` is false
   
   **Supported Data Types:**
   - Input types: Any timestamp type (`TimestampType`, `TimestampNTZType`)
   - Both operands must be timestamp types
   
   **Edge Cases:**
   - **Null handling**: Returns null if either timestamp operand is null 
(null-intolerant behavior)
   
   - **Timezone handling**: Uses timezone from left operand's data type; 
respects explicit timeZoneId parameter
   
   - **Overflow**: Large timestamp differences may cause microsecond overflow 
in the resulting interval
   
   - **Negative intervals**: When left timestamp is earlier than right 
timestamp, produces negative interval values
   
   **Examples:**
   ```sql
   -- Basic timestamp subtraction
   SELECT end_time - start_time AS duration 
   FROM events;
   
   -- With explicit timestamps
   SELECT TIMESTAMP '2023-12-25 10:30:00' - TIMESTAMP '2023-12-25 09:15:30' AS 
time_diff;
   ```
   
   ```scala
   // DataFrame API usage
   import org.apache.spark.sql.functions._
   
   df.select(col("end_timestamp") - col("start_timestamp") as "duration")
   
   // With literal timestamps
   df.select(
     (lit("2023-12-25 10:30:00").cast("timestamp") - 
      lit("2023-12-25 09:15:30").cast("timestamp")) as "duration"
   )
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Medium
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.SubtractTimestamps`
   
   **Related:**
   - `DateAdd` - Add intervals to dates
   - `TimestampAdd` - Add intervals to timestamps  
   - `DateDiff` - Calculate date differences
   - `CalendarInterval` - Legacy interval representation
   - `DayTimeIntervalType` - Modern interval type
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to