kosiew opened a new pull request, #19221:
URL: https://github.com/apache/datafusion/pull/19221

   ## Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and 
enhancements and this helps us generate change logs for our releases. You can 
link an issue to this PR using the GitHub syntax. For example `Closes #123` 
indicates that this PR will close issue #123.
   -->
   
   * Closes #18991.
   
   ## Rationale for this change
   
   The current unparser behavior materializes an explicit `1` literal for empty 
projection lists, generating SQL of the form `SELECT 1 FROM ...` even for 
dialects (such as PostgreSQL and DataFusion) that support `SELECT FROM ...` 
with an empty select list.
   
   For external or federated sources, this can lead to:
   
   * Mismatches between the logical plan schema (empty projection) and the 
physical schema produced by the generated SQL (single `1` column), which then 
becomes confusing when converting to Arrow.
   * Misleading semantics in downstream consumers (e.g. plans that logically 
represent "no columns" suddenly gain a synthetic column).
   * Unnecessary data movement / computation when the intent is to operate only 
on row counts or existence checks.
   
   This PR updates the unparser to:
   
   * Preserve the empty projection semantics for dialects that support `SELECT 
FROM ...`, and
   * Provide a dialect hook so that other backends can continue to use a 
compatible fallback such as `SELECT 1 FROM ...`.
   
   This aligns the generated SQL more closely with the logical plan, improves 
compatibility with PostgreSQL, and reduces surprises around schema shape for 
aggregate-style queries over external data sources.
   
   ## What changes are included in this PR?
   
   This PR makes the following changes:
   
   1. **SelectBuilder semantics for projections**
   
      * Change `SelectBuilder.projection` from `Vec<ast::SelectItem>` to 
`Option<Vec<ast::SelectItem>>` to distinguish:
   
        * `None`: projection has not yet been set,
        * `Some(vec![])`: explicitly empty projection, and
        * `Some(vec![...])`: non-empty projection.
      * Update `projection()` to set `Some(value)` and `pop_projections()` to 
`take()` the projection (returning an empty vec by default).
      * Redefine `already_projected()` to return `true` whenever the projection 
has been explicitly set (including the empty case), by checking 
`projection.is_some()`.
      * Adjust `build()` and `Default` to work with the new `Option`-typed 
projection (defaulting to `None` and using `unwrap_or_default()` when building 
the AST).
   
   2. **Dialect capability: empty select list support**
   
      * Extend the `Dialect` trait with a new method:
   
        * `fn supports_empty_select_list(&self) -> bool { false }`
      * Document the intended semantics and behavior across common SQL engines, 
with the default returning `false` for maximum compatibility.
      * Override this method in `PostgreSqlDialect` to return `true`, allowing 
`SELECT FROM ...` to be generated.
   
   3. **Unparser handling of empty projections**
   
      * Add a helper on `Unparser`:
   
        * `fn empty_projection_fallback(&self) -> Vec<Expr>`
   
          * Returns an empty vec if `supports_empty_select_list()` is `true`.
          * Returns `vec![Expr::Literal(ScalarValue::Int64(Some(1)), None)]` 
otherwise.
      * Update `unparse_table_scan_pushdown` to:
   
        * Take `&self` instead of being a purely static helper, so it can 
consult the dialect.
        * When encountering a `TableScan` with `Some(vec![])` as projection and 
`already_projected == false`, use `self.empty_projection_fallback()` instead of 
hard-coding a `1` literal.
      * Update the few call sites of `unparse_table_scan_pushdown` to call the 
instance method (`self.unparse_table_scan_pushdown(...)`).
   
   4. **Tests**
   
      * Add snapshot tests covering both PostgreSQL and the default dialect for 
empty projection table scans:
   
        * `test_table_scan_with_empty_projection_in_plan_to_sql_postgres`
   
          * Asserts `SELECT FROM "table"` for `UnparserPostgreSqlDialect`.
        * `test_table_scan_with_empty_projection_in_plan_to_sql_default_dialect`
   
          * Asserts `SELECT 1 FROM "table"` for `UnparserDefaultDialect`.
      * Add tests for empty projection with filters:
   
        * `test_table_scan_with_empty_projection_and_filter_postgres`
   
          * Asserts `SELECT FROM "table" WHERE ("table"."id" > 10)`.
        * `test_table_scan_with_empty_projection_and_filter_default_dialect`
   
          * Asserts `SELECT 1 FROM "table" WHERE ("table".id > 10)`.
      * These tests complement the existing 
`table_scan_with_empty_projection_in_plan_to_sql_*` coverage to exercise both 
dialect-specific behavior and interaction with filters.
   
   ## Are these changes tested?
   
   Yes.
   
   * New snapshot tests have been added in `plan_to_sql.rs` to cover:
   
     * Empty projections for both the PostgreSQL and default dialects.
     * Empty projections combined with a filter predicate.
   * Existing `plan_to_sql` tests continue to pass, ensuring that behavior for 
non-empty projections and other dialect features is unchanged.
   
   ## Are there any user-facing changes?
   
   Yes, for users of the SQL unparser:
   
   * For dialects that support empty select lists (currently PostgreSQL via 
`PostgreSqlDialect`):
   
     * Logical plans with an explicitly empty projection will now unparse to 
`SELECT FROM ...` instead of `SELECT 1 FROM ...`.
     * This more accurately reflects the logical schema (no columns) and avoids 
introducing a synthetic literal column.
   * For dialects that do **not** support empty select lists:
   
     * The behavior remains effectively the same: the unparser still emits a 
non-empty projection (currently `SELECT 1 FROM ...`).
     * The behavior is now routed through the new `supports_empty_select_list` 
hook, so dialects can opt into different fallbacks in the future if needed.
   
   The new `supports_empty_select_list` method on `Dialect` has a default 
implementation, so existing dialect implementations remain source-compatible 
and do not require changes.
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to