andygrove opened a new issue, #3110: URL: https://github.com/apache/datafusion-comet/issues/3110
## What is the problem the feature request solves? > **Note:** This issue was generated with AI assistance. The specification details have been extracted from Spark documentation and may need verification. Comet does not currently support the Spark `precise_timestamp_conversion` function, causing queries using this function to fall back to Spark's JVM execution instead of running natively on DataFusion. PreciseTimestampConversion is an internal Spark Catalyst expression used for converting TimestampType to Long and back without losing precision during time windowing operations. It preserves microsecond-level precision by maintaining the internal representation format used by Spark's timestamp handling. Supporting this expression would allow more Spark workloads to benefit from Comet's native acceleration. ## Describe the potential solution ### Spark Specification **Syntax:** This is an internal expression not directly exposed in SQL or DataFrame API. It is automatically generated during time windowing operations. **Arguments:** | Argument | Type | Description | |----------|------|-------------| | child | Expression | The input expression to be converted | | fromType | DataType | The source data type for conversion | | toType | DataType | The target data type for conversion | **Return Type:** Returns the data type specified by the `toType` parameter, typically either TimestampType or LongType depending on conversion direction. **Supported Data Types:** Supports conversion between TimestampType and LongType while preserving microsecond precision for time windowing operations. **Edge Cases:** - **Null handling**: Expression is null-intolerant (`nullIntolerant = true`), meaning null inputs produce null outputs - **Type safety**: Input types are validated against the specified `fromType` through `ExpectsInputTypes` trait - **Precision preservation**: Maintains full microsecond precision during timestamp conversions - **Code generation fallback**: Always uses code generation path with direct value assignment **Examples:** ```sql -- This expression is not directly accessible in SQL -- It is automatically used internally during time window operations SELECT window(timestamp_col, '1 hour') FROM events; ``` ```scala // Not directly accessible in DataFrame API // Used internally during time windowing operations df.groupBy(window($"timestamp", "1 hour")).count() ``` ### Implementation Approach See the [Comet guide on adding new expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html) for detailed instructions. 1. **Scala Serde**: Add expression handler in `spark/src/main/scala/org/apache/comet/serde/` 2. **Register**: Add to appropriate map in `QueryPlanSerde.scala` 3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if needed 4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has built-in support first) ## Additional context **Difficulty:** Medium **Spark Expression Class:** `org.apache.spark.sql.catalyst.expressions.PreciseTimestampConversion` **Related:** - TimeWindow expressions for windowing operations - UnaryExpression base class for single-input expressions - ExpectsInputTypes trait for type validation - TimestampType and LongType data types --- *This issue was auto-generated from Spark reference documentation.* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
