andygrove opened a new issue, #3115:
URL: https://github.com/apache/datafusion-comet/issues/3115
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `timestamp_add_ym_interval`
function, causing queries using this function to fall back to Spark's JVM
execution instead of running natively on DataFusion.
The `TimestampAddYMInterval` expression adds a year-month interval to a
timestamp value. This operation is timezone-aware and handles both
`TimestampType` and `TimestampNTZType` inputs while preserving the original
timestamp data type.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
timestamp_column + INTERVAL 'value' YEAR TO MONTH
```
```scala
// DataFrame API usage
col("timestamp_column") + expr("INTERVAL '2-3' YEAR TO MONTH")
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| timestamp | Expression | The timestamp expression to add the interval to |
| interval | Expression | The year-month interval expression to add |
| timeZoneId | Option[String] | Optional timezone identifier for
timezone-aware operations |
**Return Type:** Returns the same data type as the input timestamp
expression (`TimestampType` or `TimestampNTZType`).
**Supported Data Types:**
- Input timestamp: `AnyTimestampType` (`TimestampType` or `TimestampNTZType`)
- Input interval: `YearMonthIntervalType`
**Edge Cases:**
- **Null handling**: Returns null if either timestamp or interval input is
null (null intolerant)
- **Timezone handling**: Uses session timezone for `TimestampType` and UTC
for `TimestampNTZType`
- **Month overflow**: Handles month arithmetic that crosses year boundaries
correctly
- **Day adjustment**: May adjust day values when adding months to dates like
January 31st + 1 month
**Examples:**
```sql
-- Add 2 years and 3 months to a timestamp
SELECT timestamp_col + INTERVAL '2-3' YEAR TO MONTH FROM events;
-- Add 1 year to current timestamp
SELECT current_timestamp() + INTERVAL '1' YEAR;
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(col("created_at") + expr("INTERVAL '1-6' YEAR TO MONTH"))
// Using interval column
df.select(col("timestamp_col") + col("interval_col"))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.TimestampAddYMInterval`
**Related:**
- `DateAddYMInterval` - Adds year-month intervals to date values
- `TimestampAddDTInterval` - Adds day-time intervals to timestamps
- `DateTimeUtils.timestampAddMonths()` - Underlying implementation method
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]