aharpervc commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2874016995
> But actually I think if we can come up with something that doesn't touch that core parser loop we might have something mergeable. I'll reopen this PR, feel free to iterate if you have ideas to try out and I'm happy to take a look when you do! Sounds good... I will brainstorm on this 👍. --- It may also be useful to clarify what is my overall objective... basically, I think it'd be useful read a whole file into a string, then pass it to this library for parsing, then process the AST's. Since SQL code I work with (and in the industry) often includes `GO` batch delimiters in the file source code, and this library doesn't support batch delimiters outside of this PR, that presented a difficulty. Initially I was thinking that the smoothest approach would be to add batch delimiter support directly into the parser. However, this discussion is making me consider the pro/con more closely. Something else I was thinking about is how the current "parse statements" API _implies_ that you're only giving the parser sql of a _single batch_. Eg, `single batch string` -> `Vec<Statement>` makes complete sense in this model. However this PR introduced a much more odd model, where a single string could actually be multiple batches. Eg, `Vec<Batch>` -> `Vec<Statements>` is a strange API. So I think there could be a couple possible approaches going forward: _Option 1: Embrace Batches_ We could introduce a Batch struct & new top level `String` -> `Vec<Batch>`, then `Batch` -> `Vec<Statements>`. `Batch` could be like ```rust struct Batch { statements: Vec<Statement>, count: u16 // or whatever } ``` This seems useful because it makes explicit what was formerly implicit. Probably all the existing top level API's could be left alone, so all existing users uninterested in Batches would be unaffected. Then we'd want something like `parse_batches` to return a `Vec<Batch>` and then do all the normal parsing. The downside is, it would introduce a new concept & ongoing maintenance when perhaps there isn't a lot of interest. Also, not every dialect even has the concept of a batch, I think, so that's a difficulty here too. _Option 2: Reject Batches_ Close this PR permanently & open a new PR, that would basically be a docs/readme update, to clarify what is the expected string users should provide to parse_sql & similar API's. The clarification is it needs to be the contents of a SQL batch, since that is the thing that can clearly be turned into a single `Vec<Statement>`. Dialects such as SQL Server which include batch delimiters in the SQL source code should be processed _outside_ of this library to split the batches, then the batch string can be given to sqlparser-rs. The downside is back to that original objective, which is reading a whole SQL file to a string then passing it to this library for parsing. If library users have to do our own detection of the batch delimiter keyword, now we all have to implement a mini parser before using the actual parser. That seems really unfortunate. _Option 3: Hold Batches out at arm's length_ (thesis, antithesis, ...synthesis) I don't have a clear vision on this option yet, but it seems like this library is well positioned to do _some_ work on this topic, but perhaps not as much as full on Batch support. What if we had something like "parse_batches", but it returned a simple `Vec<String>`? I'm imagining something like: ```rust fn split_batches -> Vec<String> { let mut result = vec![]; while let Some(token) = self.tokens.next() { if peek_keyword(Keyword::GO) { self.prev_token(); result.push(parse_go()); } } } ``` The concept here is the library has the ability to scan through the tokens & parse the GO keyword, and split a single string into Vec<String> but would otherwise remain unchanged. Dialects that support batches could use this API to turn a whole file string into something suitable for parse_statements. Other dialects would remain unaffected. --- I'd be interested to hear anyone else's opinion on this too, for/against any option or to brainstorm other options. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org