Re: [PR] Enhance the nested type access for Generic and DuckDB dialect [datafusion-sqlparser-rs]

via GitHub Wed, 27 Nov 2024 09:01:22 -0800


goldmedal commented on code in PR #1541:
URL: 
https://github.com/apache/datafusion-sqlparser-rs/pull/1541#discussion_r1861002348



##########
src/parser/mod.rs:
##########
@@ -2935,12 +2935,23 @@ impl<'a> Parser<'a> {
             })
         } else if Token::LBracket == tok {
             if dialect_of!(self is PostgreSqlDialect | DuckDbDialect | 
GenericDialect) {
-                self.parse_subscript(expr)
+                let expr = self.parse_multi_dim_subscript(expr)?;
+                if self.dialect.support_period_map_access_key() {
+                    self.parse_map_access(expr, vec![])
+                } else {
+                    Ok(expr)
+                }

Review Comment:
   > 1. Not sure I fully followed what we mean by preserving syntax, do we gain 
from it from a AST-consuming pov you mean?
   
   It's just my assumption. If I'm wrong, please correct me.
   I mean if we accept the nested presentation, we don't need to change 
`Subscript` and won't make breaking changes for the array or map syntax. I'm 
not sure about other downstream projects. At least, DataFusion won't be broken. 
If it would cause big breaking change, I think reconsidering about it 🤔 
   
   > 2. Do you mean that those won't be possible if the representation is 
linear?
   > 
   When working on this issue, I found we implemented many different `Expr` for 
similar purposes (access chain). For example
   - `MapAccess` for `a[1][2][3]` or `a[1].b.c[3]`
   - `Subscript` for `a[1]` or `a[1][2]`, ...
   - `JsonAccess` for `a.b[0].c`, ...
   - `CompoundIdentifier` for `a.b.c`
   - `CompositeAccess` for `( .. ).a`
   - `Method` for `func1().func2().func3()`
   
   They are the same thing, with different combinations and orderings, and 
maybe for various dialects. I think it also means we have various code paths 
for the same purposes.
   I hope to use some basic components to present all of them. 
   
   I think it's possible to use a linear representation to do it but it could 
make a huge breaking change.
   
   > I was primarily thinking it would be easier and more efficient to traverse 
the ast if the above variants are expressed as a chain without nesting. 
Currently, both the parser and downstream crates tend to struggle with the 
recursive nature when there's a lot of nesting going on in large/complex sql.
   > 
   > Offhand, not sure I have a full picture of either approach though, its not 
super clear to me what the disadvantage would be with linear, or if there are 
advantages to using a nested representation
   
   It makes sense to me. Indeed, the complex nested representation is a 
potential issue for performance or usage 🤔 
   I tried to draft a new linear representation like:
   ```rust
   pub enum Expr {
       ...
       // The haed will be concant with all the access fields using the dot.
       //
       // a[1] is present to CompoundExpr { head: Identifer(a), chain: 
[Subscript(Script(1))] }
       // a.b[1].c is present to CompoundExpr { head: Identifer(a), chain: 
[Expr(Ident(b)), Subscript(SubScirpt(1)), Expr(Ident(b))]
       // func1().a is present to CompoundExpr { head: FuntionCall(func1, ..), 
chain: [Expr(Ident(b))] }
       CompoundExpr {
           head: Box<Expr>,
           chain: Vec<AccessField>,
       },
      ...
   }
   
   #[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
   #[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
   #[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
   pub enum AccessField {
       Expr(Expr),
       SubScript(Subscript)
   }
   ```
   I think it can cover many cases of the access chain. I'm not sure about 
naming but I don't prefer to keep using `Expr::SubScript` or `Expr::MapAcces` 
because it has turned to a different meaning. I prefer to remove both of them.
   
   What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] Enhance the nested type access for Generic and DuckDB dialect [datafusion-sqlparser-rs]

Reply via email to