iffyio commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1880610494
########## src/parser/mod.rs: ########## @@ -1427,6 +1426,112 @@ impl<'a> Parser<'a> { } } + /// Try to parse an [Expr::CompoundExpr] like `a.b.c` or `a.b[1].c`. + /// If all the fields are `Expr::Identifier`s, return an [Expr::CompoundIdentifier] instead. + /// If only the root exists, return the root. + /// If self supports [Dialect::supports_partiql], it will fall back when occurs [Token::LBracket] for JsonAccess parsing. + pub fn parse_compound_expr( + &mut self, + root: Expr, + mut chain: Vec<AccessField>, + ) -> Result<Expr, ParserError> { + let mut ending_wildcard: Option<TokenWithSpan> = None; + let mut ending_lbracket = false; + while self.consume_token(&Token::Period) { + let next_token = self.next_token(); + match next_token.token { + Token::Word(w) => { + let expr = Expr::Identifier(w.to_ident(next_token.span)); + chain.push(AccessField::Expr(expr)); + if self.consume_token(&Token::LBracket) { + if self.dialect.supports_partiql() { + ending_lbracket = true; + break; + } else { + self.parse_multi_dim_subscript(&mut chain)? + } + } + } + Token::Mul => { + // Postgres explicitly allows funcnm(tablenm.*) and the + // function array_agg traverses this control flow + if dialect_of!(self is PostgreSqlDialect) { + ending_wildcard = Some(next_token); + break; + } else { + return self.expected("an identifier after '.'", next_token); + } + } + Token::SingleQuotedString(s) => { + let expr = Expr::Identifier(Ident::with_quote('\'', s)); + chain.push(AccessField::Expr(expr)); + } + _ => { + return self.expected("an identifier or a '*' after '.'", next_token); + } + } + } + + // if dialect supports partiql, we need to go back one Token::LBracket for the JsonAccess parsing + if self.dialect.supports_partiql() && ending_lbracket { + self.prev_token(); + } + + if let Some(wildcard_token) = ending_wildcard { Review Comment: Ah right indeed `CompoundExpr` is a superset of `CompoundIdentifier` so there likely wouldn't be a way to avoid that `exprs_to_idents` flattening, like in the scenario where a `foo.bar.baz` expr shows up. That case seems reasonable to flatten since we'd want to keep the representations distinct! The QualifiedWildcard I think could be fine this same loop as above can handle it I imagine if we peek the next token after the period? i.e if we consume a `.*` then we build up the wildcard expr as we do today with the root+chain consumed so far. So that essentially the loop parses the chain delimited by period and the `CompoundIdentifier` vs `QualifiedWildcard` are the same scenarios only differing in the terminating token -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org