[I] [Feature] Support Spark expression: parse_to_date [datafusion-comet]

via GitHub Wed, 14 Jan 2026 11:34:04 -0800


andygrove opened a new issue, #3093:
URL: https://github.com/apache/datafusion-comet/issues/3093


   ## What is the problem the feature request solves?
   
   > **Note:** This issue was generated with AI assistance. The specification 
details have been extracted from Spark documentation and may need verification.
   
   Comet does not currently support the Spark `parse_to_date` function, causing 
queries using this function to fall back to Spark's JVM execution instead of 
running natively on DataFusion.
   
   ParseToDate is a Spark Catalyst expression that converts string values to 
date values using an optional format pattern. It serves as a 
runtime-replaceable expression that delegates to GetTimestamp and Cast 
operations internally, providing backward compatibility for date parsing 
operations.
   
   Supporting this expression would allow more Spark workloads to benefit from 
Comet's native acceleration.
   
   ## Describe the potential solution
   
   ### Spark Specification
   
   **Syntax:**
   ```sql
   to_date(date_str[, format])
   ```
   
   ```scala
   // DataFrame API usage
   df.select(to_date($"date_column"))
   df.select(to_date($"date_column", "yyyy-MM-dd"))
   ```
   
   **Arguments:**
   | Argument | Type | Description |
   |----------|------|-------------|
   | left | Expression | The input expression containing the date string or 
date-like value to convert |
   | format | Option[Expression] | Optional format pattern string specifying 
how to parse the input |
   | timeZoneId | Option[String] | Optional timezone identifier for date 
parsing (default: None) |
   | ansiEnabled | Boolean | Whether ANSI SQL compliance mode is enabled, 
affecting error handling behavior |
   
   **Return Type:** DateType - Returns a date value representing the parsed 
input.
   
   **Supported Data Types:**
   - StringTypeWithCollation (with trim collation support)
   - DateType  
   - TimestampType
   - TimestampNTZType
   
   **Edge Cases:**
   - Null input values are handled according to the underlying Cast operation 
behavior
   - Invalid date strings may return null or throw exceptions based on 
ansiEnabled setting
   - When ansiEnabled is true, invalid formats cause runtime exceptions
   - When ansiEnabled is false, invalid formats typically return null values
   - Empty strings are processed according to the Cast operation's null 
handling rules
   
   **Examples:**
   ```sql
   -- Basic date parsing with default format
   SELECT to_date('2016-12-31') AS parsed_date;
   
   -- Date parsing with custom format
   SELECT to_date('12/31/2016', 'MM/dd/yyyy') AS parsed_date;
   
   -- Parsing timestamp to date
   SELECT to_date(current_timestamp()) AS today;
   ```
   
   ```scala
   // DataFrame API usage
   import org.apache.spark.sql.functions._
   
   // Basic date conversion
   df.select(to_date($"date_string"))
   
   // With custom format
   df.select(to_date($"date_string", "MM/dd/yyyy"))
   
   // Converting timestamp column to date
   df.select(to_date($"timestamp_column"))
   ```
   
   ### Implementation Approach
   
   See the [Comet guide on adding new 
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
 for detailed instructions.
   
   1. **Scala Serde**: Add expression handler in 
`spark/src/main/scala/org/apache/comet/serde/`
   2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
   3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if 
needed
   4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has 
built-in support first)
   
   
   ## Additional context
   
   **Difficulty:** Medium
   **Spark Expression Class:** 
`org.apache.spark.sql.catalyst.expressions.ParseToDate`
   
   **Related:**
   - GetTimestamp - underlying expression used for formatted date parsing
   - Cast - used for direct type conversion when no format is specified  
   - ParseToTimestamp - similar expression for parsing to timestamp type
   - DateFormatClass - related date formatting expressions
   
   ---
   *This issue was auto-generated from Spark reference documentation.*
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] [Feature] Support Spark expression: parse_to_date [datafusion-comet]

Reply via email to