xitep commented on issue #2065:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/issues/2065#issuecomment-3536296072

   maybe a simpler and more general purpose idea for the parser to deal with 
comments could be the following one (feedback is highly be appreciated):
   
   since comments can appear almost everywhere within a statement (or parsed 
script), either 1) the AST would have to provide a "before-/after-comment" for 
all tokens/nodes in the AST or 2) the parser would make up specific rules how 
to associated a single comment with a nearby token (this, i understood, is what 
is suggested as part of this issue.)
   
   the former approach would mean very convenient access to the comments right 
away from the tokens themselves, but it would grow the AST (assuming that most 
tokens would not be associated with comments, it could be considered wasteful.) 
on the other, the parser making up its own mind about how to interpret and 
associate comments, it could be restrictive for downstream clients.
   
   the other day, i wrote a project specific (java) code generator to process 
SQL like this ...
   ```sql
   select id        /* long */    ,
              name /* String */  ,
              ...
   ```
   ... essentially associating metadata with particular tokens in the SQL query 
in a format of my choosing (for later code generation.)
   
   it would be fully sufficient for my use-case(s) to have `sqlparser-rs` 
provide a `Parser::parse_sql_with_comments` method returning the AST as is, 
plus a separate, independent structure representing the extract comments, e.g.
   
   ```rust
    Parser::parse_sql_with_comments(..) -> Result<(Vec<Statement>, Comments), 
..>)`
   ```
   
   with `Comments` (and associated) having the following API:
   
   ```rust
   ... Comments {
     /// Possibly locates a SQL comment directly preceding the specified token
     fn find_before(&self, token: &TokenWithSpan) -> Option<CommentWithSpan>;
   
     /// Possibly locates a SQL comment directly following the specified token
     fn find_after(&self, token: &TokenWithSpan) -> Option<CommentWithSpan>;
   
     /// Retrieves all comments encountered.
     fn as_slice(&self) -> &[CommentWithSpan];
   }
   
   struct CommentWithSpan {
     span: Span,
     text: Comment,
   }
   
   enum Comment {
       SingleLine { content: String, prefix: String },
       MultiLine(String),
   }
   ```
   
   I see the following advantages with this concept:
   
   1. It does not require to extend existing AST nodes; since most of them will 
not be associated with comments at all this would save space
   2. It would also mean that parsing _with_ and _without_ comments would not 
be a big difference; when parsing without comments we'd merely discard the 
comments (status-quo) instead of accumulating.
   3. Since parsing and encountering comments is done from begin to end of the 
parsed SQL script (and comments not being overlapping), accumulating the 
comments (as encountering them while parsing) in a plain `Vec` would in fact 
end-up with a vec that can efficiently be searched with binary search by a 
given span's start/end.
   
   what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to