andygrove opened a new issue, #21516:
URL: https://github.com/apache/datafusion/issues/21516

   ### Describe the bug
   
   DataFusion-spark treats `\t` and `\n` in SQL string literals as literal 
backslash characters, while Apache Spark interprets them as escape sequences 
(tab and newline). This affects any function that receives string arguments 
containing these sequences.
   
   ### To Reproduce
   
   **PySpark (Spark behavior):**
   ```sql
   SELECT soundex('\thello');  -- returns tab + "hello" (soundex passes through 
non-alpha input)
   SELECT soundex('\nhello');  -- returns newline + "hello"
   SELECT length('\thello');   -- 6 (tab is one character)
   SELECT length('\nhello');   -- 6 (newline is one character)
   ```
   
   **DataFusion-spark (current behavior):**
   ```sql
   SELECT soundex('\thello');  -- returns literal "\thello" (backslash-t-hello)
   SELECT soundex('\nhello');  -- returns literal "\nhello" (backslash-n-hello)
   SELECT length('\thello');   -- 7 (\t is two characters: backslash and t)
   SELECT length('\nhello');   -- 7 (\n is two characters: backslash and n)
   ```
   
   ### Expected behavior
   
   DataFusion-spark should interpret `\t`, `\n`, and other escape sequences in 
string literals the same way Spark does, for Spark compatibility.
   
   ### Additional context
   
   This is a string literal parsing issue, not specific to `soundex`. It 
affects all string functions. The `.slt` tests at `string/soundex.slt` lines 83 
and 193 have expected values that match DataFusion's literal interpretation 
rather than Spark's escape interpretation.
   
   This was discovered by running a PySpark validation script against the 
`.slt` test files (see #17045, #21508).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to