yonatan-sevenai opened a new pull request, #21593: URL: https://github.com/apache/datafusion/pull/21593
## Which issue does this PR close? - Closes #21592. ## Rationale for this change The SQL unparser needs a Snowflake dialect. Basic dialect settings (identifier quoting, `NULLS FIRST`/`NULLS LAST`, timestamp types) are straightforward, but `UNNEST` support required more than configuration. Snowflake has no `UNNEST` keyword. Its equivalent, `LATERAL FLATTEN(INPUT => expr)`, is a table function in the `FROM` clause with output accessed via `alias."VALUE"`. This differs structurally from standard SQL: the unparser must emit a `FROM`-clause table factor with a `CROSS JOIN` instead of a `SELECT`-clause expression. It also must rewrite column references to point at the FLATTEN output, and handle several optimizer-produced plan shapes (intermediate `Limit`/`Sort` nodes, `SubqueryAlias` wrappers, composed expressions wrapping the unnest output, multi-expression projections). None of this can be expressed through `CustomDialectBuilder`. ## What changes are included in this PR? **`dialect.rs`** - New `SnowflakeDialect` with double-quote identifiers, `NULLS FIRST`/`NULLS LAST`, no empty select lists, no column aliases in table aliases, Snowflake timestamp types, and `unnest_as_lateral_flatten()`. Also wired into `CustomDialect`/`CustomDialectBuilder`. **`ast.rs`** - New `FlattenRelationBuilder` that produces `LATERAL FLATTEN(INPUT => expr, OUTER => bool)` table factors, parallel to the existing `UnnestRelationBuilder`. **`utils.rs`** - New `unproject_unnest_expr_as_flatten_value` transform that rewrites unnest placeholder columns to `_unnest.VALUE` references. **`plan.rs`** - Changes to `select_to_sql_recursively`: - The `Projection` handler scans all expressions for unnest placeholders (not just single-expression projections), then branches into the FLATTEN path or the existing table-factor path. - `peel_to_unnest_with_modifiers` walks through `Limit`/`Sort` nodes between `Projection` and `Unnest`, applying their SQL modifiers to the query builder. This handles an optimizer behavior where these nodes are inserted between the two. - `peel_to_inner_projection` walks through `SubqueryAlias` to find the inner `Projection` that feeds an `Unnest`. - `reconstruct_select_statement` gained FLATTEN-aware expression rewriting and a `has_internal_unnest_alias` predicate to strip internal `UNNEST(...)` display names. - The `Unnest` handler rejects struct columns for the FLATTEN dialect with a clear error. ## Are these changes tested? Yes. 18 new tests covering: - Simple inline arrays, string arrays, cross joins - Implicit `FROM` (UNNEST in SELECT clause) - User aliases, table aliases, literal + unnest - Subselect source with filters and limit - UDF result as FLATTEN input - `Limit` between `Projection` and `Unnest` - `Sort` between `Projection` and `Unnest` - `Limit` + `SubqueryAlias` combined - Composed expressions wrapping unnest output (e.g. `CAST`) - Composed expressions with `Limit` - Multi-expression projections - Multi-expression projections with `Limit` - `SubqueryAlias` between `Unnest` and inner `Projection` ## Are there any user-facing changes? Yes. New public API surface: - `SnowflakeDialect` struct and its constructor - `Dialect::unnest_as_lateral_flatten()` method (default `false`) - `CustomDialectBuilder::with_unnest_as_lateral_flatten()` - `FlattenRelationBuilder` and `FLATTEN_DEFAULT_ALIAS` in the AST module None of these are breaking changes, and all previous APIs should work. New traits have default implementations to ease migrations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
