andygrove opened a new issue, #3093:
URL: https://github.com/apache/datafusion-comet/issues/3093
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `parse_to_date` function, causing
queries using this function to fall back to Spark's JVM execution instead of
running natively on DataFusion.
ParseToDate is a Spark Catalyst expression that converts string values to
date values using an optional format pattern. It serves as a
runtime-replaceable expression that delegates to GetTimestamp and Cast
operations internally, providing backward compatibility for date parsing
operations.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
to_date(date_str[, format])
```
```scala
// DataFrame API usage
df.select(to_date($"date_column"))
df.select(to_date($"date_column", "yyyy-MM-dd"))
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| left | Expression | The input expression containing the date string or
date-like value to convert |
| format | Option[Expression] | Optional format pattern string specifying
how to parse the input |
| timeZoneId | Option[String] | Optional timezone identifier for date
parsing (default: None) |
| ansiEnabled | Boolean | Whether ANSI SQL compliance mode is enabled,
affecting error handling behavior |
**Return Type:** DateType - Returns a date value representing the parsed
input.
**Supported Data Types:**
- StringTypeWithCollation (with trim collation support)
- DateType
- TimestampType
- TimestampNTZType
**Edge Cases:**
- Null input values are handled according to the underlying Cast operation
behavior
- Invalid date strings may return null or throw exceptions based on
ansiEnabled setting
- When ansiEnabled is true, invalid formats cause runtime exceptions
- When ansiEnabled is false, invalid formats typically return null values
- Empty strings are processed according to the Cast operation's null
handling rules
**Examples:**
```sql
-- Basic date parsing with default format
SELECT to_date('2016-12-31') AS parsed_date;
-- Date parsing with custom format
SELECT to_date('12/31/2016', 'MM/dd/yyyy') AS parsed_date;
-- Parsing timestamp to date
SELECT to_date(current_timestamp()) AS today;
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
// Basic date conversion
df.select(to_date($"date_string"))
// With custom format
df.select(to_date($"date_string", "MM/dd/yyyy"))
// Converting timestamp column to date
df.select(to_date($"timestamp_column"))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Medium
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.ParseToDate`
**Related:**
- GetTimestamp - underlying expression used for formatted date parsing
- Cast - used for direct type conversion when no format is specified
- ParseToTimestamp - similar expression for parsing to timestamp type
- DateFormatClass - related date formatting expressions
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]