iffyio commented on code in PR #1705: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1705#discussion_r1942311180
########## src/dialect/mod.rs: ########## @@ -880,6 +880,15 @@ pub trait Dialect: Debug + Any { fn supports_table_hints(&self) -> bool { false } + + /// Returns whether it's the start of a single line comment + /// e.g. MySQL requires a space after `--` to be a single line comment + /// Otherwise it's interpreted as a double minus operator + /// + /// MySQL: <https://dev.mysql.com/doc/refman/8.4/en/ansi-diff-comments.html> + fn requires_whitespace_to_start_comment(&self) -> bool { Review Comment: ```suggestion fn requires_single_line_comment_whitespace(&self) -> bool { ``` thinking something like this to flag that this is only for the `--` comment style ########## src/dialect/mysql.rs: ########## @@ -125,6 +125,15 @@ impl Dialect for MySqlDialect { fn supports_table_hints(&self) -> bool { true } + + /// Returns whether it's the start of a single line comment + /// e.g. MySQL requires a space after `--` to be a single line comment + /// Otherwise it's interpreted as a double minus operator + /// Review Comment: Since this is already duplicated at the definition of the dialect method, we could consider not including it (The link to the doc on the other hand is useful in any case) ########## src/dialect/mod.rs: ########## @@ -880,6 +880,15 @@ pub trait Dialect: Debug + Any { fn supports_table_hints(&self) -> bool { false } + + /// Returns whether it's the start of a single line comment Review Comment: ```suggestion /// Returns true if this dialect requires a whitespace after `--` for single line comments. ``` ########## src/tokenizer.rs: ########## @@ -1229,15 +1229,31 @@ impl<'a> Tokenizer<'a> { // operators '-' => { chars.next(); // consume the '-' - match chars.peek() { - Some('-') => { - chars.next(); // consume the second '-', starting a single-line comment + + if let Some('-') = chars.peek() { + let mut is_comment = true; + if self.dialect.requires_whitespace_to_start_comment() { + // MySQL requires a space after the -- for a single-line comment + // Otherwise it's interpreted as two minus signs + // e.g. UPDATE account SET balance=balance--1 + // WHERE account_id=5752; Review Comment: Can we move the example to the doc string in the dialect method? That would help clarify the behavior to folks without having to dig deeper in the docs or the parser code. Then for the rest of the comment here we can probably skip since we've already documented as much in the dialect method ########## src/tokenizer.rs: ########## @@ -1229,15 +1229,31 @@ impl<'a> Tokenizer<'a> { // operators '-' => { chars.next(); // consume the '-' - match chars.peek() { - Some('-') => { - chars.next(); // consume the second '-', starting a single-line comment + + if let Some('-') = chars.peek() { Review Comment: would it be possible to have this remain as a match statement, calling `chars.peek()` only once? i.e. ```rust match chars.peek() { Some('-') => { ... } Some('>') => { ... } } ``` ########## tests/sqlparser_mysql.rs: ########## @@ -3244,3 +3244,62 @@ fn parse_double_precision() { "CREATE TABLE foo (bar DOUBLE(11,0))", ); } + +#[test] +fn parse_looks_like_single_line_comment() { + // See https://dev.mysql.com/doc/refman/8.4/en/ansi-diff-comments.html + match mysql().parse_sql_statements( + r#" + UPDATE account SET balance=balance--1 + WHERE account_id=5752 + "#, Review Comment: can we use `mysql().verified_stmt()` for the tests? ########## src/tokenizer.rs: ########## @@ -1229,15 +1229,31 @@ impl<'a> Tokenizer<'a> { // operators '-' => { chars.next(); // consume the '-' - match chars.peek() { - Some('-') => { - chars.next(); // consume the second '-', starting a single-line comment + + if let Some('-') = chars.peek() { + let mut is_comment = true; + if self.dialect.requires_whitespace_to_start_comment() { + // MySQL requires a space after the -- for a single-line comment + // Otherwise it's interpreted as two minus signs + // e.g. UPDATE account SET balance=balance--1 + // WHERE account_id=5752; + match chars.peekable.clone().nth(1) { + Some(' ') => (), + _ => is_comment = false, + } Review Comment: ```suggestion is_comment = Some(' ') == chars.peekable.clone().nth(1); ``` looks like this can be simplified? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org