xitep commented on issue #2065:
URL:
https://github.com/apache/datafusion-sqlparser-rs/issues/2065#issuecomment-3536296072
maybe a simpler and more general purpose idea for the parser to deal with
comments could be the following one (feedback is highly be appreciated):
since comments can appear almost everywhere within a statement (or parsed
script), either 1) the AST would have to provide a "before-/after-comment" for
all tokens/nodes in the AST or 2) the parser would make up specific rules how
to associated a single comment with a nearby token (this, i understood, is what
is suggested as part of this issue.)
the former approach would mean very convenient access to the comments right
away from the tokens themselves, but it would grow the AST (assuming that most
tokens would not be associated with comments, it could be considered wasteful.)
on the other, the parser making up its own mind about how to interpret and
associate comments, it could be restrictive for downstream clients.
the other day, i wrote a project specific (java) code generator to process
SQL like this ...
```sql
select id /* long */ ,
name /* String */ ,
...
```
... essentially associating metadata with particular tokens in the SQL query
in a format of my choosing (for later code generation.)
it would be fully sufficient for my use-case(s) to have `sqlparser-rs`
provide a `Parser::parse_sql_with_comments` method returning the AST as is,
plus a separate, independent structure representing the extract comments, e.g.
```rust
Parser::parse_sql_with_comments(..) -> Result<(Vec<Statement>, Comments),
..>)`
```
with `Comments` (and associated) having the following API:
```rust
... Comments {
/// Possibly locates a SQL comment directly preceding the specified token
fn find_before(&self, token: &TokenWithSpan) -> Option<CommentWithSpan>;
/// Possibly locates a SQL comment directly following the specified token
fn find_after(&self, token: &TokenWithSpan) -> Option<CommentWithSpan>;
/// Retrieves all comments encountered.
fn as_slice(&self) -> &[CommentWithSpan];
}
struct CommentWithSpan {
span: Span,
text: Comment,
}
enum Comment {
SingleLine { content: String, prefix: String },
MultiLine(String),
}
```
I see the following advantages with this concept:
1. It does not require to extend existing AST nodes; since most of them will
not be associated with comments at all this would save space
2. It would also mean that parsing _with_ and _without_ comments would not
be a big difference; when parsing without comments we'd merely discard the
comments (status-quo) instead of accumulating.
3. Since parsing and encountering comments is done from begin to end of the
parsed SQL script (and comments not being overlapping), accumulating the
comments (as encountering them while parsing) in a plain `Vec` would in fact
end-up with a vec that can efficiently be searched with binary search by a
given span's start/end.
what do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]