kosiew opened a new pull request, #19884:
URL: https://github.com/apache/datafusion/pull/19884

   
   ## Which issue does this PR close?
   
   * Closes #19840.
   
   ---
   
   ## Rationale for this change
   
   When a `TableProvider` supports filter pushdown (for example 
`TableProviderFilterPushDown::Exact`), the DataFusion optimizer inlines 
predicates directly into the `TableScan` node instead of keeping them in a 
separate `Filter` node.
   
   The existing DML planning logic only extracted predicates from explicit 
`Filter` nodes. As a result, DELETE and UPDATE operations executed against 
providers with filter pushdown support received no filters in `delete_from` / 
`update`, causing **all rows to be modified instead of only the intended 
subset**.
   
   This change ensures that DML filter extraction correctly accounts for 
optimizer-pushed predicates and preserves expected semantics for DELETE and 
UPDATE statements.
   
   ---
   
   ## What changes are included in this PR?
   
   * **Enhanced DML filter extraction**
   
     * Updated `extract_dml_filters` to collect predicates from both:
   
       * `LogicalPlan::Filter` nodes
       * `LogicalPlan::TableScan.filters` (for pushed-down predicates)
     * Split conjunctions (`AND`) into individual expressions consistently 
across both sources.
     * Strip column qualifiers so expressions match the `TableProvider` schema.
     * Deduplicate filters to avoid passing the same predicate twice when it 
appears in both a `Filter` node and a `TableScan`.
   
   * **Improved documentation and comments**
   
     * Updated function-level documentation to explain TableScan filter 
handling and deduplication rationale.
     * Added inline comments clarifying why deduplication is required.
   
   * **Expanded test coverage**
   
     * Added test infrastructure to support configurable filter pushdown 
behavior in a custom `TableProvider`.
     * Added a regression test verifying that DELETE operations correctly 
extract filters from `TableScan` when filter pushdown is enabled.
     * Added a compound-filter test to ensure deduplication does not suppress 
distinct predicates.
   
   ---
   
   ## Are these changes tested?
   
   Yes. This PR includes new tests that:
   
   * Verify optimizer behavior by asserting filters are present in 
`LogicalPlan::TableScan` when pushdown is enabled.
   * Confirm that `delete_from` receives the correct number and content of 
filter expressions.
   * Validate correct handling of compound predicates (`AND`) without 
over-deduplication.
   
   Existing DELETE and UPDATE tests continue to pass, ensuring backward 
compatibility when filter pushdown is not used.
   
   ---
   
   ## Are there any user-facing changes?
   
   Yes — **this is a correctness fix**.
   
   Users implementing custom `TableProvider`s with filter pushdown support will 
now receive the expected filter predicates in `delete_from` and `update`. This 
restores correct behavior for DELETE and UPDATE statements with WHERE clauses 
and prevents accidental full-table modifications.
   
   No API changes are introduced.
   
   ---
   
   ## LLM-generated code disclosure
   
   This PR includes LLM-generated code and comments. All LLM-generated content 
has been manually reviewed and tested.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to