mvzink commented on code in PR #1735: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1735#discussion_r1966627492
########## tests/sqlparser_mysql.rs: ########## @@ -2530,6 +2530,16 @@ fn parse_rlike_and_regexp() { } } +#[test] +fn parse_like_with_escape() { + mysql().verified_only_select(r#"SELECT * FROM customer WHERE name LIKE 'a\%c'"#); Review Comment: Yeah, it took me a bit to realize what was going on when I encountered this. I'll try to explain it better in comments and commit messages. These tests demonstrate that the backslash is not being stripped. Here is this same test without the fix: ``` thread 'parse_like_with_escape' panicked at src/test_utils.rs:164:13: assertion `left == right` failed left: "SELECT * FROM customer WHERE name LIKE 'a\\%c'" right: "SELECT * FROM customer WHERE name LIKE 'a%c'" ``` The final 2 asserts on the big "Testing:" string in the `parse_escaped_string_with_unescape` test in `tests/sqlparser_common.rs` may make it bit clearer than this mysql-only test, because you can see the 2 different escaping behaviors side by side. In the current behavior (which is correct for all escaping dialects except MySQL afaik), `'\%'` and `'\_'` are turned into`'%'` and `'_'`. This is the same as the 'default' escape rule: if there's no escaping rule for a given character, the backslash is simply stripped and the escaped character left in place. This 'default' is visible in `parse_escaped_string_with_unescape` with `'\h \ '` becoming `'h '`: there is no specific escape rule for `h` or ` `. The specific (and weird) escape rule that MySQL has for `%` and `_` is that they are left alone but the backslash is *not* stripped. The reason for this is so that they can be escaped in `LIKE` patterns without double-escaping them like you have to do in, e.g. Snowflake (as described in the Snowflake docs and the comment on `Dialect::ignores_like_wildcard_escapes`). But MySQL chose to make it a special case not just for LIKE patterns, but for all string literals. MySQL: ``` mysql> select '\_', hex('\\'), hex('_'), hex('\_'), hex('h'), hex('\h'); +----+-----------+----------+-----------+----------+-----------+ | \_ | hex('\\') | hex('_') | hex('\_') | hex('h') | hex('\h') | +----+-----------+----------+-----------+----------+-----------+ | \_ | 5C | 5F | 5C5F | 68 | 68 | +----+-----------+----------+-----------+----------+-----------+ 1 row in set (0.00 sec) mysql> select '_' LIKE '\%', '_' LIKE '\_', '\_' LIKE '\_'; +---------------+---------------+----------------+ | '_' LIKE '\%' | '_' LIKE '\_' | '\_' LIKE '\_' | +---------------+---------------+----------------+ | 0 | 1 | 0 | +---------------+---------------+----------------+ 1 row in set (0.00 sec) ``` (Note in the last LIKE expression, there is nothing to match the unstripped backslash character in the scrutinee, so it doesn't match.) As far as I can tell from the Snowflake docs (BigQuery and Clickhouse are less clear), to express the equivalent LIKE expressions you would have to double-escape the backslashes, because this special case for `_` and `%` don't exist: ``` select '_' LIKE '\\%', '_' LIKE '\\_', '\\_' LIKE '\\_'; ``` (And, of course, to make things more fun, double-escaping also works in MySQL...) Like I said, I'll see what I can do to explain this better in comments/tests, but let me know if this explanation makes sense. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org