andygrove opened a new issue, #3136:
URL: https://github.com/apache/datafusion-comet/issues/3136
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `months_between` function,
causing queries using this function to fall back to Spark's JVM execution
instead of running natively on DataFusion.
The `MonthsBetween` expression calculates the number of months between two
timestamp values. It returns a double precision number representing the
fractional months difference, with optional rounding behavior controlled by a
boolean parameter.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
months_between(date1, date2[, roundOff])
```
```scala
months_between(date1, date2)
months_between(date1, date2, roundOff)
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| date1 | TimestampType | The first timestamp value (end date) |
| date2 | TimestampType | The second timestamp value (start date) |
| roundOff | BooleanType | Optional. Whether to round the result (defaults
to true) |
**Return Type:** `DoubleType` - Returns a double precision floating point
number representing the number of months between the two dates.
**Supported Data Types:**
- Input: TimestampType for date arguments, BooleanType for roundOff parameter
- Implicit casting is supported through `ImplicitCastInputTypes` trait
- Output: Always returns DoubleType regardless of input precision
**Edge Cases:**
- **Null handling**: Returns null if any input argument is null
(nullIntolerant = true)
- **Timezone sensitivity**: Results depend on the configured timezone
through TimeZoneAwareExpression
- **Fractional months**: Returns fractional values for partial month
differences
- **Rounding behavior**: When roundOff is true, applies specific rounding
rules defined in DateTimeUtils
**Examples:**
```sql
-- Basic usage
SELECT months_between('2023-06-15', '2023-01-15') as months_diff;
-- Result: 5.0
-- With rounding disabled
SELECT months_between('2023-06-20', '2023-01-10', false) as exact_months;
-- Result: 5.322580645161290
-- Negative result (date1 < date2)
SELECT months_between('2023-01-15', '2023-06-15') as months_diff;
-- Result: -5.0
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions.months_between
df.select(months_between($"end_date", $"start_date").as("duration_months"))
// With explicit rounding parameter
df.select(months_between($"end_date", $"start_date",
lit(false)).as("exact_duration"))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.MonthsBetween`
**Related:**
- `datediff` - Calculate difference in days between dates
- `add_months` - Add months to a timestamp
- `date_sub` / `date_add` - Add or subtract days from dates
- Other datetime functions in the `datetime_funcs` group
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]