aharpervc commented on PR #1809:
URL:
https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2874016995
> But actually I think if we can come up with something that doesn't touch
that core parser loop we might have something mergeable. I'll reopen this PR,
feel free to iterate if you have ideas to try out and I'm happy to take a look
when you do!
Sounds good... I will brainstorm on this 👍.
---
It may also be useful to clarify what is my overall objective... basically,
I think it'd be useful read a whole file into a string, then pass it to this
library for parsing, then process the AST's. Since SQL code I work with (and in
the industry) often includes `GO` batch delimiters in the file source code, and
this library doesn't support batch delimiters outside of this PR, that
presented a difficulty. Initially I was thinking that the smoothest approach
would be to add batch delimiter support directly into the parser. However, this
discussion is making me consider the pro/con more closely.
Something else I was thinking about is how the current "parse statements"
API _implies_ that you're only giving the parser sql of a _single batch_. Eg,
`single batch string` -> `Vec<Statement>` makes complete sense in this model.
However this PR introduced a much more odd model, where a single string could
actually be multiple batches. Eg, `Vec<Batch>` -> `Vec<Statements>` is a
strange API.
So I think there could be a couple possible approaches going forward:
_Option 1: Embrace Batches_
We could introduce a Batch struct & new top level `String` -> `Vec<Batch>`,
then `Batch` -> `Vec<Statements>`. `Batch` could be like
```rust
struct Batch {
statements: Vec<Statement>,
count: u16 // or whatever
}
```
This seems useful because it makes explicit what was formerly implicit.
Probably all the existing top level API's could be left alone, so all existing
users uninterested in Batches would be unaffected. Then we'd want something
like `parse_batches` to return a `Vec<Batch>` and then do all the normal
parsing.
The downside is, it would introduce a new concept & ongoing maintenance when
perhaps there isn't a lot of interest. Also, not every dialect even has the
concept of a batch, I think, so that's a difficulty here too.
_Option 2: Reject Batches_
Close this PR permanently & open a new PR, that would basically be a
docs/readme update, to clarify what is the expected string users should provide
to parse_sql & similar API's. The clarification is it needs to be the contents
of a SQL batch, since that is the thing that can clearly be turned into a
single `Vec<Statement>`. Dialects such as SQL Server which include batch
delimiters in the SQL source code should be processed _outside_ of this library
to split the batches, then the batch string can be given to sqlparser-rs.
The downside is back to that original objective, which is reading a whole
SQL file to a string then passing it to this library for parsing. If library
users have to do our own detection of the batch delimiter keyword, now we all
have to implement a mini parser before using the actual parser. That seems
really unfortunate.
_Option 3: Hold Batches out at arm's length_ (thesis, antithesis,
...synthesis)
I don't have a clear vision on this option yet, but it seems like this
library is well positioned to do _some_ work on this topic, but perhaps not as
much as full on Batch support. What if we had something like "parse_batches",
but it returned a simple `Vec<String>`? I'm imagining something like:
```rust
fn split_batches -> Vec<String> {
let mut result = vec![];
while let Some(token) = self.tokens.next() {
if peek_keyword(Keyword::GO) {
self.prev_token();
result.push(parse_go());
}
}
}
```
The concept here is the library has the ability to scan through the tokens &
parse the GO keyword, and split a single string into Vec<String> but would
otherwise remain unchanged. Dialects that support batches could use this API to
turn a whole file string into something suitable for parse_statements. Other
dialects would remain unaffected.
---
I'd be interested to hear anyone else's opinion on this too, for/against any
option or to brainstorm other options.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]