[I] [Feature] Support Spark expression: time_trunc [datafusion-comet]

via GitHub Wed, 14 Jan 2026 11:47:57 -0800


andygrove opened a new issue, #3123:
URL: https://github.com/apache/datafusion-comet/issues/3123


   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `time_trunc` function, causing 
queries using this function to fall back to Spark's JVM execution instead of 
running natively on DataFusion.
   
   The `TimeTrunc` expression truncates time values to a specified unit of time 
precision. It rounds down time components (hours, minutes, seconds, 
microseconds) to the nearest specified unit boundary, effectively "flooring" 
the time value to remove precision below the specified granularity.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   time_trunc(unit, time_expr)
   ```
   
   ```scala
   // DataFrame API
   col("time_column").expr("time_trunc('HOUR', time_column)")
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | unit | String | The time unit to truncate to (e.g., 'HOUR', 'MINUTE', 
'SECOND') |
   | time | Time type | The time expression to be truncated |
   
   **Return Type:** Returns the same data type as the input `time` expression. 
The return type is dynamically determined based on the input time data type.
   
   **Supported Data Types:**
   - **unit**: String type with collation support (must support trim collation)
   - **time**: Any time-related data type (`AnyTimeType`) including TIME, 
TIMESTAMP, etc.
   
   **Edge Cases:**
   - **Null handling**: Returns null if either `unit` or `time` arguments are 
null
   - **Invalid unit**: Throws runtime exception for unrecognized time unit 
strings
   - **Boundary values**: Minimum time values remain unchanged when truncated
   - **Case sensitivity**: Unit parameter handling depends on 
`DateTimeUtils.timeTrunc` implementation
   
   **Examples:**
   ```sql
   -- Truncate to hour boundary
   SELECT time_trunc('HOUR', TIME '14:32:05.123') AS truncated_time;
   -- Result: 14:00:00.000
   
   -- Truncate to minute boundary  
   SELECT time_trunc('MINUTE', TIME '09:32:05.123') AS truncated_time;
   -- Result: 09:32:00.000
   ```
   
   ```scala
   // DataFrame API usage
   import org.apache.spark.sql.functions._
   
   df.select(expr("time_trunc('HOUR', time_col)").as("hour_truncated"))
   df.withColumn("minute_trunc", expr("time_trunc('MINUTE', timestamp_col)"))
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Medium
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.TimeTrunc`
   
   **Related:**
   - `date_trunc` - For truncating date/timestamp values to date units
   - `DateTimeUtils.timeTrunc` - The underlying implementation method
   - Time and timestamp manipulation functions in the datetime_funcs group
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Feature] Support Spark expression: time_trunc [datafusion-comet]

Reply via email to