graup opened a new issue, #1676: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1676
First off, I'm relatively new to both this codebase and Rust in general. But I thought it might be useful to start this topic anyway. There are a bunch of places where **parentheses** can appear in SQL. Currently these are not handled consistently in the AST. This makes it difficult or impossible to attach Spans to all of them. @Nyrox who did most of the work introducing Spanned [mentioned](https://github.com/apache/datafusion-sqlparser-rs/issues/1563#issuecomment-2586942739) using AttachedToken to encode opening and closing parenthesis tokens in the AST. I think it would be good to come up with a standard way that can be applied to all of those nodes. Also it seems in many instances this might be a breaking change. Attaching parentheses tokens to the AST is important for accurate spans: The span for the Function node `CALL(something)` should go all the way from `C` to `)`, not just `CALL` or even worse, `CALL(something`. Currently there's no way to encode this information. On the other hand, parentheses are not semantically meaningful for the AST, so I don't want to needlessly complicate the structure. One thing that's a bit vague to me is if the parentheses should be attached to the 'thing' itself or its container? E.g. there is Function -> FunctionArguments -> FunctionArgumentsList, to which of those do the parens belong? Similarly, Expr::InSubquery has a subquery field; do the parens belong to the subquery or to the InSubquery? (Reading Fmt::Display for Expr::InSubquery would imply the parens belong to it since it renders them and not the subquery which is just an Expr – but the subquery field could be a new Subquery type, which then would be able to render its own parens... 🤔) To make matters even more complicated, some of these parentheses are optional (see `FunctionArguments` and also the somewhat related `OneOrManyWithParens`). Here are some nodes that need to know about their parentheses (I might have missed some; to be systematic I'd probably go through every mention of LParen/RParen in the parser). in ast/mod.rs - Expr::Subquery - Expr::InSubquery - Expr::InUnnest - Expr::Exists - Expr::AllOp.right - FunctionArguments - FunctionArgumentList - Statement::Execute.has_parentheses in ast/query.rs - Cte.closing_paren_token Here's an initial idea: instead of adding `opening_paren_token` and `closing_paren_token` everywhere, how about creating a new struct `ParensPair { opening: AttachedToken, closing: AttachedToken }`? Or might we augment `OneOrManyWithParens` for this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
