SamAya21 opened a new pull request, #21599: URL: https://github.com/apache/datafusion/pull/21599
Summary This PR fixes Spark-compatible handling of escape sequences in SQL string literals #21516 . The issue showed up in datafusion-spark string function behavior, but the root cause was not in soundex itself. The actual problem was that quoted SQL string literals were being converted into DataFusion literal expressions without unescaping sequences such as \t, \n, \\, \', and octal escapes. As a result, literals like '\t hello' were treated as the two characters \ and t instead of a tab character followed by hello. What changed This change updates SQL value handling in datafusion/sql/src/expr/value.rs so that: regular quoted string literals are unescaped before being converted to Expr::Literal escaped string literals follow the same unescape path common escape sequences are supported: \0 \b \n \r \t \Z \\ \' \" \% \_ octal escapes of up to 3 digits are supported, such as \101 Why this belongs here Although the failing behavior was observed in Spark string functions, the underlying bug was earlier in the SQL literal pipeline. parse_value(...) in value.rs was converting normal quoted strings directly with lit(s), preserving backslash escape text instead of producing the intended string value. Fixing the issue at the value-conversion layer ensures all string functions receive the correct literal content. Tests Added unit tests covering: tab, newline, and carriage return escapes escaped quotes and backslashes octal escapes unknown escapes trailing backslash behavior Notes While working on validation, I also ran into projection-name conflicts when selecting multiple literals that now resolve to the same final value like / and //. For SQL-level tests, this is avoided by aliasing projected literals, and updated test case with cargo insta review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
