andygrove opened a new issue, #3127:
URL: https://github.com/apache/datafusion-comet/issues/3127
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `minutes_of_time` function,
causing queries using this function to fall back to Spark's JVM execution
instead of running natively on DataFusion.
The `MinutesOfTime` expression extracts the minute component from a
time-based value. This expression is implemented as a `RuntimeReplaceable` that
delegates to the `DateTimeUtils.getMinutesOfTime` method at runtime. It returns
an integer representing the minutes portion (0-59) of the input time value.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
minute(time_expr)
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| child | AnyTimeType | The time expression from which to extract the minute
component |
**Return Type:** `IntegerType` - Returns an integer value representing the
minute component (0-59).
**Supported Data Types:**
The expression accepts any time-based data type through the `AnyTimeType`
constraint:
- TimeType
- TimestampType
- TimestampNTZType
**Edge Cases:**
- **Null handling**: Returns null when the input time expression is null
- **Invalid time values**: Behavior depends on the underlying
`DateTimeUtils.getMinutesOfTime` implementation
- **Timezone considerations**: For timestamp types, the minute extraction
may be affected by timezone settings
- **Leap seconds**: Standard minute extraction logic applies, leap seconds
are handled by the underlying time utilities
**Examples:**
```sql
-- Extract minute from current timestamp
SELECT minute(current_timestamp()) AS current_minute;
-- Extract minute from time literal
SELECT minute(TIME '14:35:20') AS time_minute;
-- Extract minute from timestamp column
SELECT minute(created_at) AS creation_minute FROM events;
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
// Extract minute from timestamp column
df.select(minute(col("timestamp_col")).alias("minute_value"))
// Using with current timestamp
df.select(minute(current_timestamp()).alias("current_minute"))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.MinutesOfTime`
**Related:**
- `HourOfTime` - Extract hour component from time values
- `SecondsOfTime` - Extract seconds component from time values
- `DateTimeUtils` - Underlying utility class for time operations
- `TimeExpression` - Base trait for time-related expressions
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]