goldmedal commented on code in PR #1541: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1541#discussion_r1861002348
########## src/parser/mod.rs: ########## @@ -2935,12 +2935,23 @@ impl<'a> Parser<'a> { }) } else if Token::LBracket == tok { if dialect_of!(self is PostgreSqlDialect | DuckDbDialect | GenericDialect) { - self.parse_subscript(expr) + let expr = self.parse_multi_dim_subscript(expr)?; + if self.dialect.support_period_map_access_key() { + self.parse_map_access(expr, vec![]) + } else { + Ok(expr) + } Review Comment: > 1. Not sure I fully followed what we mean by preserving syntax, do we gain from it from a AST-consuming pov you mean? It's just my assumption. If I'm wrong, please correct me. I mean if we accept the nested presentation, we don't need to change `Subscript` and won't make breaking changes for the array or map syntax. I'm not sure about other downstream projects. At least, DataFusion won't be broken. If it would cause big breaking change, I think reconsidering about it 🤔 > 2. Do you mean that those won't be possible if the representation is linear? > When working on this issue, I found we implemented many different `Expr` for similar purposes (access chain). For example - `MapAccess` for `a[1][2][3]` or `a[1].b.c[3]` - `Subscript` for `a[1]` or `a[1][2]`, ... - `JsonAccess` for `a.b[0].c`, ... - `CompoundIdentifier` for `a.b.c` - `CompositeAccess` for `( .. ).a` - `Method` for `func1().func2().func3()` They are the same thing, with different combinations and orderings, and maybe for various dialects. I think it also means we have various code paths for the same purposes. I hope to use some basic components to present all of them. I think it's possible to use a linear representation to do it but it could make a huge breaking change. > I was primarily thinking it would be easier and more efficient to traverse the ast if the above variants are expressed as a chain without nesting. Currently, both the parser and downstream crates tend to struggle with the recursive nature when there's a lot of nesting going on in large/complex sql. > > Offhand, not sure I have a full picture of either approach though, its not super clear to me what the disadvantage would be with linear, or if there are advantages to using a nested representation It makes sense to me. Indeed, the complex nested representation is a potential issue for performance or usage 🤔 I tried to draft a new linear representation like: ```rust pub enum Expr { ... // The haed will be concant with all the access fields using the dot. // // a[1] is present to CompoundExpr { head: Identifer(a), chain: [Subscript(Script(1))] } // a.b[1].c is present to CompoundExpr { head: Identifer(a), chain: [Expr(Ident(b)), Subscript(SubScirpt(1)), Expr(Ident(b))] // func1().a is present to CompoundExpr { head: FuntionCall(func1, ..), chain: [Expr(Ident(b))] } CompoundExpr { head: Box<Expr>, chain: Vec<AccessField>, }, ... } #[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)] #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] #[cfg_attr(feature = "visitor", derive(Visit, VisitMut))] pub enum AccessField { Expr(Expr), SubScript(Subscript) } ``` I think it can cover many cases of the access chain. I'm not sure about naming but I don't prefer to keep using `Expr::SubScript` or `Expr::MapAcces` because it has turned to a different meaning. I prefer to remove both of them. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org