SamAya21 opened a new pull request, #21599:
URL: https://github.com/apache/datafusion/pull/21599

   Summary
   This PR fixes Spark-compatible handling of escape sequences in SQL string 
literals #21516 .
   
   The issue showed up in datafusion-spark string function behavior, but the 
root cause was not in soundex itself. The actual problem was that quoted SQL 
string literals were being converted into DataFusion literal expressions 
without unescaping sequences such as \t, \n, \\, \', and octal escapes.
   
   As a result, literals like '\t hello' were treated as the two characters \ 
and t instead of a tab character followed by  hello.
   
   What changed
   This change updates SQL value handling in datafusion/sql/src/expr/value.rs 
so that:
   
   regular quoted string literals are unescaped before being converted to 
Expr::Literal
   escaped string literals follow the same unescape path
   common escape sequences are supported:
   \0
   \b
   \n
   \r
   \t
   \Z
   \\
   \'
   \"
   \%
   \_
   octal escapes of up to 3 digits are supported, such as \101
   
   Why this belongs here
   Although the failing behavior was observed in Spark string functions, the 
underlying bug was earlier in the SQL literal pipeline. parse_value(...) in 
value.rs was converting normal quoted strings directly with lit(s), preserving 
backslash escape text instead of producing the intended string value.
   
   Fixing the issue at the value-conversion layer ensures all string functions 
receive the correct literal content.
   
   Tests
   Added unit tests covering:
   
   tab, newline, and carriage return escapes
   escaped quotes and backslashes
   octal escapes
   unknown escapes
   trailing backslash behavior
   
   Notes
   While working on validation, I also ran into projection-name conflicts when 
selecting multiple literals that now resolve to the same final value like / and 
//. For SQL-level tests, this is avoided by aliasing projected literals, and 
updated test case with cargo insta review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to