adriangb opened a new pull request, #20092:
URL: https://github.com/apache/datafusion/pull/20092

   ## Which issue does this PR close?
   
   Related to:
   - https://github.com/apache/datafusion/issues/19894 - Unified 
`TableScan.filters` representation
   - https://github.com/apache/datafusion/issues/19950 - `UPDATE ...FROM` bug 
(filter extraction improvements)
   - https://github.com/apache/datafusion/pull/20091 - Complementary work on 
consolidated TableScan representation (projections as expressions)
   
   ## Rationale
   
   Currently, the optimizer calls `supports_filters_pushdown()` to classify 
filters during logical optimization. This results in a **split representation** 
where:
   - Exact/Inexact filters go to `TableScan.filters`
   - Unsupported/Inexact/Volatile filters stay as `Filter` nodes above the scan
   
   This creates several problems (as described in #19894):
   - **Filter duplication risk**: The same predicate may exist in both a Filter 
node and TableScan.filters
   - **Semantic confusion**: Unclear which filters are "pushed down" vs. 
"logical"
   - **Implementation burden**: DML operations must collect filters from 
multiple locations
   - **Multi-table safety hazards**: UPDATE...FROM scenarios become fragile
   
   ## What changes are included in this PR?
   
   This PR moves ALL filter expressions to `TableScan.filters` during logical 
optimization, deferring classification (Exact/Inexact/Unsupported) to the 
physical planner.
   
   ### Changes to `push_down_filter.rs`:
   - Simplified TableScan case to push ALL filters (except scalar subqueries) 
to `TableScan.filters`
   - Removed filter classification logic (now handled by physical planner)
   
   ### Changes to `physical_planner.rs`:
   - Enhanced TableScan handler to:
     - Classify filters using `supports_filters_pushdown()`
     - Create `FilterExec` for Unsupported/Inexact/Volatile filters
     - Handle projection expansion when filters need columns not in user's 
projection
     - Apply limits correctly when post-filtering is needed
   - Added `compute_scan_projection_with_filters()` helper
   - Added `create_filter_exec()` helper with async UDF support
   - Updated `extract_dml_filters()` to also extract from `TableScan.filters`
   
   ### Behavior Changes:
   1. **Logical Plan**: All filters (except scalar subqueries) now appear in 
`TableScan.filters` instead of as separate `Filter` nodes
   2. **Physical Plan**: The physical planner creates `FilterExec` nodes for 
Unsupported/Inexact/Volatile filters
   3. **Projection Handling**: When post-scan filters need columns not in the 
user's projection, we expand the scan projection and add a final 
`ProjectionExec` to trim extra columns
   
   ## Are these changes tested?
   
   Yes - updated existing tests to match new plan representations:
   - Optimizer tests (snapshot updates)
   - Physical planner tests
   - Core integration tests
   - Dataframe and view tests
   
   ## Are there any user-facing changes?
   
   **Plan output changes**: Users will see filters in `TableScan` with 
`partial_filters=` or `unsupported_filters=` annotations in logical plans, 
rather than separate `Filter:` nodes. Physical plans remain functionally 
equivalent with `FilterExec` nodes where needed.
   
   ---
   
   🤖 Generated with [Claude Code](https://claude.ai/code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to