andygrove opened a new issue, #21516:
URL: https://github.com/apache/datafusion/issues/21516
### Describe the bug
DataFusion-spark treats `\t` and `\n` in SQL string literals as literal
backslash characters, while Apache Spark interprets them as escape sequences
(tab and newline). This affects any function that receives string arguments
containing these sequences.
### To Reproduce
**PySpark (Spark behavior):**
```sql
SELECT soundex('\thello'); -- returns tab + "hello" (soundex passes through
non-alpha input)
SELECT soundex('\nhello'); -- returns newline + "hello"
SELECT length('\thello'); -- 6 (tab is one character)
SELECT length('\nhello'); -- 6 (newline is one character)
```
**DataFusion-spark (current behavior):**
```sql
SELECT soundex('\thello'); -- returns literal "\thello" (backslash-t-hello)
SELECT soundex('\nhello'); -- returns literal "\nhello" (backslash-n-hello)
SELECT length('\thello'); -- 7 (\t is two characters: backslash and t)
SELECT length('\nhello'); -- 7 (\n is two characters: backslash and n)
```
### Expected behavior
DataFusion-spark should interpret `\t`, `\n`, and other escape sequences in
string literals the same way Spark does, for Spark compatibility.
### Additional context
This is a string literal parsing issue, not specific to `soundex`. It
affects all string functions. The `.slt` tests at `string/soundex.slt` lines 83
and 193 have expected values that match DataFusion's literal interpretation
rather than Spark's escape interpretation.
This was discovered by running a PySpark validation script against the
`.slt` test files (see #17045, #21508).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]