iffyio commented on code in PR #1735: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1735#discussion_r1974849855
########## src/dialect/mod.rs: ########## @@ -201,6 +201,33 @@ pub trait Dialect: Debug + Any { false } + /// Determine whether the dialect strips the backslash when escaping LIKE wildcards (%, _). + /// + /// [MySQL] has a special case when escaping single quoted strings which leaves these unescaped + /// so they can be used in LIKE patterns without double-escaping (as is necessary in other + /// escaping dialects, such as [Snowflake]). Generally, special characters have escaping rules + /// causing them to be replaced with a different byte sequences (e.g. `'\0'` becoming the zero + /// byte), and the default if an escaped character does not have a specific escaping rule is to + /// strip the backslash (e.g. there is no rule for `h`, so `'\h' = 'h'`). MySQL's special case + /// for ignoring LIKE wildcard escapes is to *not* strip the backslash, so that `'\%' = '\\%'`. + /// This applies to all string literals though, not just those used in LIKE patterns. + /// + /// ```text + /// mysql> select '\_', hex('\\'), hex('_'), hex('\_'); + /// +----+-----------+----------+-----------+ + /// | \_ | hex('\\') | hex('_') | hex('\_') | + /// +----+-----------+----------+-----------+ + /// | \_ | 5C | 5F | 5C5F | + /// +----+-----------+----------+-----------+ + /// 1 row in set (0.00 sec) + /// ``` + /// + /// [MySQL]: https://dev.mysql.com/doc/refman/8.4/en/string-literals.html + /// [Snowflake]: https://docs.snowflake.com/en/sql-reference/functions/like#usage-notes + fn ignores_like_wildcard_escapes(&self) -> bool { Review Comment: ```suggestion fn ignores_wildcard_escapes(&self) -> bool { ``` maybe we drop the `like` part? as the comment suggests if its nothing special about the `LIKE` syntax and more of a general string literal escape behavior ########## src/tokenizer.rs: ########## @@ -807,6 +807,9 @@ pub struct Tokenizer<'a> { /// If true (the default), the tokenizer will un-escape literal /// SQL strings See [`Tokenizer::with_unescape`] for more details. unescape: bool, + /// If true, the tokenizer will not escape % and _, for use in in LIKE patterns. See + /// [`Dialect::ignores_like_wildcard_escapes`] for more details. + ignore_like_wildcard_escapes: bool, Review Comment: was it a reason to store this value here vs relying solely on the dialect via `self.dialect.ignores_like_wildcard_escapes()` when needed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org