kosiew opened a new pull request, #22037:
URL: https://github.com/apache/datafusion/pull/22037

   ## Which issue does this PR close?
   
   * Closes #22034.
   
   ## Rationale for this change
   
   `RecursiveQueryExec` widened recursive CTE output nullability by reconciling 
the static and recursive term schemas. This caused the physical schema to 
diverge from the logical/static CTE schema and forced valid SQL such as `0 AS 
level` to be rewritten as nullable expressions like `SUM(0) AS level`.
   
   This change preserves the declared recursive CTE schema by treating the 
static/anchor term schema as authoritative and aligning the recursive term to 
that schema during plan construction.
   
   ## What changes are included in this PR?
   
   * Added `align_plan_to_schema`, a higher-level plan-time schema alignment 
helper that guarantees the resulting plan advertises the expected schema 
exactly.
   * Kept `project_plan_to_schema` as the narrower projection-based helper and 
refactored shared validation into `validate_schema_alignment`.
   * Added `SchemaAlignExec`, an execution-plan adapter that:
   
     * advertises the expected schema from plan properties
     * preserves positional column values
     * rebinds emitted `RecordBatch` schemas inside the adapter
     * validates column count, data types, field metadata, and schema metadata
   * Updated `RecursiveQueryExec::try_new` to:
   
     * use the static term schema as the recursive CTE output schema
     * align the recursive term with `align_plan_to_schema`
     * remove recursive output schema widening logic
   * Restored the recursive CTE SLT coverage from `SUM(0) AS level` back to `0 
AS level`.
   
   ## Are these changes tested?
   
   Yes.
   
   Added and updated tests covering:
   
   * `align_plan_to_schema`:
   
     * exact schema returns unchanged plan
     * rename-only alignment uses `ProjectionExec`
     * nullable input to non-null expected schema uses `SchemaAlignExec`
     * column count mismatch errors
     * type mismatch errors
     * field metadata mismatch errors
     * schema metadata mismatch errors
   * `project_plan_to_schema`:
   
     * schema match passthrough
     * nullability widening
     * nullability narrowing rejection
     * metadata mismatch validation
   * `RecursiveQueryExec`:
   
     * recursive term projection alignment
     * preservation of the static nullability contract
     * recursive term schema matches the static schema after construction
   * Restored SQL logic test coverage in `cte.slt` using `0 AS level`.
   
   Validated with:
   
   ```bash
   cargo test -p datafusion-physical-plan recursive_query_exec
   cargo test -p datafusion-physical-plan project_plan_to_schema
   cargo test -p datafusion-sqllogictest --test sqllogictests -- cte
   ```
   
   ## Are there any user-facing changes?
   
   Yes.
   
   Recursive CTEs now preserve the declared/static schema instead of widening 
nullability based on recursive expressions. Existing valid SQL such as:
   
   ```sql
   0 AS level
   ```
   
   continues to work without requiring nullable rewrites like:
   
   ```sql
   SUM(0) AS level
   ```
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to