andygrove opened a new issue, #3126:
URL: https://github.com/apache/datafusion-comet/issues/3126
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `hours_of_time` function, causing
queries using this function to fall back to Spark's JVM execution instead of
running natively on DataFusion.
The `HoursOfTime` expression extracts the hour component from time-based
data types. It's implemented as a runtime replaceable expression that delegates
to the `DateTimeUtils.getHoursOfTime` method for the actual computation.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
HOUR(time_expression)
```
```scala
// DataFrame API
col("time_column").hour
// or
hour(col("time_column"))
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| child | Expression | The time expression from which to extract the hour
component |
**Return Type:** `IntegerType` - Returns an integer representing the hour
component (0-23 in 24-hour format).
**Supported Data Types:**
Accepts `AnyTimeType` input data types, which includes:
- TimestampType
- DateType
- TimeType (if supported by the implementation)
**Edge Cases:**
- **Null handling**: Returns null when the input expression evaluates to null
- **Invalid time values**: Behavior depends on the underlying
`DateTimeUtils.getHoursOfTime` implementation
- **Timezone considerations**: Hour extraction may be affected by session
timezone settings
- **Date-only inputs**: When applied to date types, typically returns 0 for
the hour component
**Examples:**
```sql
-- Extract hour from timestamp
SELECT HOUR(TIMESTAMP '2023-12-25 14:30:45') AS hour_part;
-- Returns: 14
-- Extract hour from current timestamp
SELECT HOUR(NOW()) AS current_hour;
-- Use in WHERE clause
SELECT * FROM events WHERE HOUR(event_time) BETWEEN 9 AND 17;
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
// Extract hour from timestamp column
df.select(hour($"timestamp_col").as("hour_part"))
// Filter by hour range
df.filter(hour($"event_time").between(9, 17))
// Group by hour
df.groupBy(hour($"created_at").as("hour")).count()
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.HoursOfTime`
**Related:**
- `MinutesOfTime` - Extract minutes from time expressions
- `SecondsOfTime` - Extract seconds from time expressions
- `DayOfMonth` - Extract day component from date/timestamp
- `Month` - Extract month component from date/timestamp
- `Year` - Extract year component from date/timestamp
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]