andygrove opened a new issue, #3106:
URL: https://github.com/apache/datafusion-comet/issues/3106
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `make_timestamp` function,
causing queries using this function to fall back to Spark's JVM execution
instead of running natively on DataFusion.
The `MakeTimestamp` expression constructs a timestamp value from separate
year, month, day, hour, minute, and second components, with optional timezone
specification. It supports microsecond precision through decimal seconds and
can operate in both fail-on-error (ANSI) and null-on-error modes depending on
configuration.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
make_timestamp(year, month, day, hour, min, sec [, timezone])
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| year | IntegerType | The year component (e.g., 2023) |
| month | IntegerType | The month component (1-12) |
| day | IntegerType | The day component (1-31) |
| hour | IntegerType | The hour component (0-23) |
| min | IntegerType | The minute component (0-59) |
| sec | DecimalType(16,6) | The second component with microsecond precision
(0-59.999999) |
| timezone | StringType (optional) | The timezone identifier (e.g., "UTC",
"America/New_York") |
**Return Type:** Returns the configured timestamp type
(`SQLConf.get.timestampType`), which can be either `TimestampType` (timestamp
with timezone) or `TimestampNTZType` (timestamp without timezone).
**Supported Data Types:**
- Year, month, day, hour, minute: Integer types that can be cast to
`IntegerType`
- Seconds: Numeric types that can be cast to `DecimalType(16,6)` to preserve
microsecond precision
- Timezone: String types with collation support
**Edge Cases:**
- Null inputs: Returns null if any input is null (null intolerant)
- Invalid dates: Returns null in non-ANSI mode, throws exception in ANSI
mode (e.g., February 30th)
- Seconds = 60: Supported only when nanoseconds = 0, adds one minute for
PostgreSQL compatibility
- Fractional seconds > 60: Throws `invalidFractionOfSecondError`
- Invalid timezone strings: Throws exception during timezone parsing
- Overflow conditions: Handled by underlying Java time libraries with
appropriate exceptions
**Examples:**
```sql
-- Create timestamp with explicit timezone
SELECT make_timestamp(2023, 12, 25, 14, 30, 45.123456, 'UTC');
-- Create timestamp using session timezone
SELECT make_timestamp(2023, 1, 1, 0, 0, 0.0);
-- Handle leap seconds (PostgreSQL compatibility)
SELECT make_timestamp(2023, 6, 30, 23, 59, 60.0, 'UTC');
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
df.withColumn("timestamp",
expr("make_timestamp(year_col, month_col, day_col, hour_col, min_col,
sec_col, 'America/New_York')"))
// Using literals
df.select(expr("make_timestamp(2023, 12, 25, 14, 30, 45.123456)"))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Large
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.MakeTimestamp`
**Related:**
- `MakeDate` - Creates date values from year, month, day components
- `ToTimestamp` - Parses timestamp from string with format
- `DateAdd` / `DateSub` - Arithmetic operations on dates
- `FromUnixTime` - Converts Unix timestamp to formatted date string
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]