kosiew opened a new issue, #23159:
URL: https://github.com/apache/datafusion/issues/23159

   ## Summary
   
   Join unparsing 
[currently](https://github.com/apache/datafusion/tree/994b926168033e06c4b379b7bac7667c958cfc1b)
 handles left and right inputs inline in `datafusion/sql/src/unparser/plan.rs`. 
Both sides perform table-scan filter extraction, but already-projected 
qualified passthrough `Projection(Join)` inputs intentionally diverge:
   
   - the left input can be unwrapped before recursive unparsing;
   - the right input may need to be converted to a nested join relation so SQL 
preserves required join grouping.
   
   Refactor the duplicated common setup into a small helper, while keeping the 
side-specific passthrough-projection behavior explicit.
   
   ## Motivation
   
   The SQL unparser must preserve alias scope when optimized logical plans 
contain join inputs that have been rewritten by optimizer passes. In 
particular, optimized plans can contain a qualified passthrough `Projection` 
over a `Join`, while the outer join condition still references aliases from 
inside that join input.
   
   The invariant is side-aware:
   
   > Before unparsing a join input in an already-projected context, avoid 
creating a derived-table boundary that hides aliases needed by an outer join 
condition. For the left input this can mean unwrapping a transparent 
passthrough `Projection(Join)`. For the right input this can mean preserving 
the nested join grouping with 
`qualified_passthrough_join_projection_to_nested_relation`.
   
   ## Current code shape
   
   Observed in `datafusion/sql/src/unparser/plan.rs`, inside 
`LogicalPlan::Join` handling:
   
   - Left input is transformed with 
`try_transform_to_simple_table_scan_with_filters`.
   - Right input repeats the same table-scan filter extraction logic.
   - Additional passthrough projection unwrapping is applied only to the left 
input when `already_projected` is true.
   - The right input has a different, intentional path: 
`qualified_passthrough_join_projection_to_nested_relation`, which can emit a 
nested join relation instead of recursively unparsing through a derived-table 
boundary.
   
   This is a maintainability risk because the common table-scan extraction is 
duplicated, while the side-specific alias-scope behavior is subtle and easy to 
accidentally “simplify” into an incorrect symmetric rewrite.
   
   ## Proposed refactor
   
   Add a local helper near the join unparsing code for the shared 
table-scan-filter extraction, for example:
   
   ```rust
   fn extract_join_input_table_scan_filters(
       plan: &Arc<LogicalPlan>,
       table_scan_filters: &mut Vec<Expr>,
   ) -> Result<Arc<LogicalPlan>> {
       match try_transform_to_simple_table_scan_with_filters(plan)? {
           Some((plan, filters)) => {
               table_scan_filters.extend(filters);
               Ok(Arc::new(plan))
           }
           None => Ok(Arc::clone(plan)),
       }
   }
   ```
   
   Then keep the side-specific handling visible:
   
   ```rust
   let left_plan = Self::extract_join_input_table_scan_filters(
       left_plan,
       &mut table_scan_filters,
   )?;
   let left_plan = if already_projected {
       Self::unwrap_qualified_passthrough_join_projection(left_plan)
   } else {
       left_plan
   };
   
   let right_plan = Self::extract_join_input_table_scan_filters(
       right_plan,
       &mut table_scan_filters,
   )?;
   
   let mut right_relation = RelationBuilder::default();
   if already_projected
       && let Some(nested_relation) = self
           .qualified_passthrough_join_projection_to_nested_relation(
               right_plan.as_ref(),
               query,
           )?
   {
       right_relation = nested_relation;
   } else {
       // existing recursive right-input unparsing
   }
   ```
   
   A side-aware helper is also acceptable if it preserves the distinction 
between left unwrapping and right nested-relation construction. Avoid a 
symmetric helper that blindly unwraps both sides.
   
   ## Expected benefits
   
   - Removes duplicated table-scan filter extraction.
   - Documents the side-specific alias-scope invariant near the code that 
enforces it.
   - Makes future join-unparse fixes easier to review.
   - Reduces inline control-flow in the already-large `LogicalPlan::Join` arm 
without hiding important left/right differences.
   - Reduces risk of an incorrect symmetric passthrough-projection rewrite.
   
   ## Scope
   
   In scope:
   
   - Local refactor under `datafusion/sql/src/unparser/plan.rs`.
   - Preserve existing behavior.
   - Apply shared table-scan-filter extraction helper to both left and right 
join inputs.
   - Preserve left-only passthrough projection unwrapping.
   - Preserve right-side nested-relation handling for qualified passthrough 
`Projection(Join)` inputs.
   - Keep helper private unless broader reuse is found.
   - Add or retain tests covering nested join inputs on both sides.
   
   Out of scope:
   
   - Redesigning the SQL unparser.
   - Changing logical optimizer output.
   - Changing SQL alias naming policy.
   - Adding public APIs.
   
   ## Suggested tests
   
   Add or retain targeted regression coverage for both sides of the invariant:
   
   1. Optimized plan where the left join input is a qualified passthrough 
`Projection(Join)` and the outer join condition references aliases from inside 
it.
   2. Right-input qualified passthrough `Projection(Join)` coverage; current 
coverage includes 
`test_unparse_projected_join_unwraps_right_nested_passthrough_projection` in 
`datafusion/sql/tests/cases/plan_to_sql.rs`.
   3. Verify generated DuckDB SQL does not create a derived-table boundary that 
hides aliases used by the outer join condition.
   4. Existing unparser roundtrip or DuckDB dialect tests should continue 
passing.
   
   A narrow command during development:
   
   ```bash
   cargo test -p datafusion --test core_integration 
optimized_duckdb_unparse_preserves_derived_table_scope -- --nocapture
   ```
   
   Consider adding a left-input counterpart if missing, rather than widening 
existing assertion-only tests too much.
   
   
   ## Acceptance criteria
   
   - Shared table-scan-filter extraction is implemented through one private 
helper.
   - Both left and right inputs use that helper.
   - Left/right passthrough-projection behavior remains intentionally distinct.
   - Existing behavior is preserved for non-join projections and 
non-passthrough projections.
   - Regression coverage includes both left and right nested join-input 
alias-scope cases.
   - Targeted unparser tests pass.
   
   ## Related PR
   #23002
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to