mvzink commented on code in PR #1735:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1735#discussion_r1966627492


##########
tests/sqlparser_mysql.rs:
##########
@@ -2530,6 +2530,16 @@ fn parse_rlike_and_regexp() {
     }
 }
 
+#[test]
+fn parse_like_with_escape() {
+    mysql().verified_only_select(r#"SELECT * FROM customer WHERE name LIKE 
'a\%c'"#);

Review Comment:
   Yeah, it took me a bit to realize what was going on when I encountered this. 
I'll try to explain it better in comments and commit messages.
   
   These tests demonstrate that the backslash is not being stripped. Here is 
this same test without the fix:
   
   ```
   thread 'parse_like_with_escape' panicked at src/test_utils.rs:164:13:
   assertion `left == right` failed
     left: "SELECT * FROM customer WHERE name LIKE 'a\\%c'"
    right: "SELECT * FROM customer WHERE name LIKE 'a%c'"
   ```
    
   The final 2 asserts on the big "Testing:" string in the 
`parse_escaped_string_with_unescape` test in `tests/sqlparser_common.rs` may 
make it bit clearer than this mysql-only test, because you can see the 2 
different escaping behaviors side by side.
   
   In the current behavior (which is correct for all escaping dialects except 
MySQL afaik), `'\%'` and `'\_'` are turned into`'%'` and `'_'`. This is the 
same as the 'default' escape rule: if there's no escaping rule for a given 
character, the backslash is simply stripped and the escaped character left in 
place. This 'default' is visible in `parse_escaped_string_with_unescape` with 
`'\h \ '` becoming `'h  '`: there is no specific escape rule for `h` or ` `.
   
   The specific (and weird) escape rule that MySQL has for `%` and `_` is that 
they are left alone but the backslash is *not* stripped. The reason for this is 
so that they can be escaped in `LIKE` patterns without double-escaping them 
like you have to do in, e.g. Snowflake (as described in the Snowflake docs and 
the comment on `Dialect::ignores_like_wildcard_escapes`). But MySQL chose to 
make it a special case not just for LIKE patterns, but for all string literals.
   
   MySQL:
   
   ```
   mysql> select '\_', hex('\\'), hex('_'), hex('\_'), hex('h'), hex('\h');
   +----+-----------+----------+-----------+----------+-----------+
   | \_ | hex('\\') | hex('_') | hex('\_') | hex('h') | hex('\h') |
   +----+-----------+----------+-----------+----------+-----------+
   | \_ | 5C        | 5F       | 5C5F      | 68       | 68        |
   +----+-----------+----------+-----------+----------+-----------+
   1 row in set (0.00 sec)
   
   mysql> select '_' LIKE '\%', '_' LIKE '\_', '\_' LIKE '\_';
   +---------------+---------------+----------------+
   | '_' LIKE '\%' | '_' LIKE '\_' | '\_' LIKE '\_' |
   +---------------+---------------+----------------+
   |             0 |             1 |              0 |
   +---------------+---------------+----------------+
   1 row in set (0.00 sec)
   ```
   
   (Note in the last LIKE expression, there is nothing to match the unstripped 
backslash character in the scrutinee, so it doesn't match.)
   
   As far as I can tell from the Snowflake docs (BigQuery and Clickhouse are 
less clear), to express the equivalent LIKE expressions you would have to 
double-escape the backslashes, because this special case for `_` and `%` don't 
exist:
   
   ```
   select '_' LIKE '\\%', '_' LIKE '\\_', '\\_' LIKE '\\_';
   ```
   
   (And, of course, to make things more fun, double-escaping also works in 
MySQL...)
   
   Like I said, I'll see what I can do to explain this better in 
comments/tests, but let me know if this explanation makes sense.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to