andygrove opened a new issue, #3186:
URL: https://github.com/apache/datafusion-comet/issues/3186
## What is the problem the feature request solves?
> **Note:** This issue was generated with AI assistance. The specification
details have been extracted from Spark documentation and may need verification.
Comet does not currently support the Spark `url_decode` function, causing
queries using this function to fall back to Spark's JVM execution instead of
running natively on DataFusion.
The `UrlDecode` expression decodes URL-encoded strings by converting
percent-encoded characters back to their original form. This expression is
implemented as a runtime replaceable expression that delegates to the
`UrlCodec.decode` method with configurable error handling behavior.
Supporting this expression would allow more Spark workloads to benefit from
Comet's native acceleration.
## Describe the potential solution
### Spark Specification
**Syntax:**
```sql
url_decode(url_string)
url_decode(url_string, fail_on_error)
```
```scala
// DataFrame API
col("url_column").expr("url_decode(url_column)")
```
**Arguments:**
| Argument | Type | Description |
|----------|------|-------------|
| child | StringType | The URL-encoded string to decode |
| failOnError | Boolean | Whether to fail on malformed input (default: true)
|
**Return Type:** Returns `StringType` - the decoded URL string.
**Supported Data Types:**
- StringType with collation support (supports trim collation)
- Input must be a valid string expression
**Edge Cases:**
- Null input returns null output (standard null propagation)
- Empty string input returns empty string
- Malformed percent-encoding behavior depends on `failOnError` flag
- When `failOnError` is true, invalid encoding throws exception
- When `failOnError` is false, invalid sequences may be left unchanged or
handled gracefully
- Supports trim collation for string comparison operations
**Examples:**
```sql
-- Basic URL decoding
SELECT url_decode('Hello%20World') AS decoded;
-- Result: "Hello World"
-- Decode with error handling
SELECT url_decode('user%40domain.com', true) AS email;
-- Result: "[email protected]"
-- Decode complex URL parameters
SELECT url_decode('param%3Dvalue%26other%3D123') AS params;
-- Result: "param=value&other=123"
```
```scala
// DataFrame API usage
import org.apache.spark.sql.functions._
df.select(expr("url_decode(encoded_url)").as("decoded"))
// With explicit error handling
df.select(expr("url_decode(encoded_url, false)").as("decoded"))
```
### Implementation Approach
See the [Comet guide on adding new
expressions](https://datafusion.apache.org/comet/contributor-guide/adding_a_new_expression.html)
for detailed instructions.
1. **Scala Serde**: Add expression handler in
`spark/src/main/scala/org/apache/comet/serde/`
2. **Register**: Add to appropriate map in `QueryPlanSerde.scala`
3. **Protobuf**: Add message type in `native/proto/src/proto/expr.proto` if
needed
4. **Rust**: Implement in `native/spark-expr/src/` (check if DataFusion has
built-in support first)
## Additional context
**Difficulty:** Large
**Spark Expression Class:**
`org.apache.spark.sql.catalyst.expressions.UrlDecode`
**Related:**
- `UrlEncode` - Companion expression for URL encoding
- String manipulation functions in `url_funcs` group
- `StaticInvoke` expression for method delegation
- Collation-aware string expressions
---
*This issue was auto-generated from Spark reference documentation.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]