[I] [Feature] Support Spark expression: hours_of_time [datafusion-comet]

via GitHub Wed, 14 Jan 2026 11:48:01 -0800


andygrove opened a new issue, #3126:
URL: https://github.com/apache/datafusion-comet/issues/3126


   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `hours_of_time` function, causing 
queries using this function to fall back to Spark's JVM execution instead of 
running natively on DataFusion.
   
   The `HoursOfTime` expression extracts the hour component from time-based 
data types. It's implemented as a runtime replaceable expression that delegates 
to the `DateTimeUtils.getHoursOfTime` method for the actual computation.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   HOUR(time_expression)
   ```
   
   ```scala
   // DataFrame API
   col("time_column").hour
   // or
   hour(col("time_column"))
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | child | Expression | The time expression from which to extract the hour 
component |
   
   **Return Type:** `IntegerType` - Returns an integer representing the hour 
component (0-23 in 24-hour format).
   
   **Supported Data Types:**
   Accepts `AnyTimeType` input data types, which includes:
   
   - TimestampType
   - DateType  
   - TimeType (if supported by the implementation)
   
   **Edge Cases:**
   - **Null handling**: Returns null when the input expression evaluates to null
   - **Invalid time values**: Behavior depends on the underlying 
`DateTimeUtils.getHoursOfTime` implementation
   - **Timezone considerations**: Hour extraction may be affected by session 
timezone settings
   - **Date-only inputs**: When applied to date types, typically returns 0 for 
the hour component
   
   **Examples:**
   ```sql
   -- Extract hour from timestamp
   SELECT HOUR(TIMESTAMP '2023-12-25 14:30:45') AS hour_part;
   -- Returns: 14
   
   -- Extract hour from current timestamp  
   SELECT HOUR(NOW()) AS current_hour;
   
   -- Use in WHERE clause
   SELECT * FROM events WHERE HOUR(event_time) BETWEEN 9 AND 17;
   ```
   
   ```scala
   // DataFrame API usage
   import org.apache.spark.sql.functions._
   
   // Extract hour from timestamp column
   df.select(hour($"timestamp_col").as("hour_part"))
   
   // Filter by hour range
   df.filter(hour($"event_time").between(9, 17))
   
   // Group by hour
   df.groupBy(hour($"created_at").as("hour")).count()
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Medium
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.HoursOfTime`
   
   **Related:**
   - `MinutesOfTime` - Extract minutes from time expressions
   - `SecondsOfTime` - Extract seconds from time expressions  
   - `DayOfMonth` - Extract day component from date/timestamp
   - `Month` - Extract month component from date/timestamp
   - `Year` - Extract year component from date/timestamp
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Feature] Support Spark expression: hours_of_time [datafusion-comet]

Reply via email to