kosiew opened a new pull request, #19884:
URL: https://github.com/apache/datafusion/pull/19884
## Which issue does this PR close?
* Closes #19840.
---
## Rationale for this change
When a `TableProvider` supports filter pushdown (for example
`TableProviderFilterPushDown::Exact`), the DataFusion optimizer inlines
predicates directly into the `TableScan` node instead of keeping them in a
separate `Filter` node.
The existing DML planning logic only extracted predicates from explicit
`Filter` nodes. As a result, DELETE and UPDATE operations executed against
providers with filter pushdown support received no filters in `delete_from` /
`update`, causing **all rows to be modified instead of only the intended
subset**.
This change ensures that DML filter extraction correctly accounts for
optimizer-pushed predicates and preserves expected semantics for DELETE and
UPDATE statements.
---
## What changes are included in this PR?
* **Enhanced DML filter extraction**
* Updated `extract_dml_filters` to collect predicates from both:
* `LogicalPlan::Filter` nodes
* `LogicalPlan::TableScan.filters` (for pushed-down predicates)
* Split conjunctions (`AND`) into individual expressions consistently
across both sources.
* Strip column qualifiers so expressions match the `TableProvider` schema.
* Deduplicate filters to avoid passing the same predicate twice when it
appears in both a `Filter` node and a `TableScan`.
* **Improved documentation and comments**
* Updated function-level documentation to explain TableScan filter
handling and deduplication rationale.
* Added inline comments clarifying why deduplication is required.
* **Expanded test coverage**
* Added test infrastructure to support configurable filter pushdown
behavior in a custom `TableProvider`.
* Added a regression test verifying that DELETE operations correctly
extract filters from `TableScan` when filter pushdown is enabled.
* Added a compound-filter test to ensure deduplication does not suppress
distinct predicates.
---
## Are these changes tested?
Yes. This PR includes new tests that:
* Verify optimizer behavior by asserting filters are present in
`LogicalPlan::TableScan` when pushdown is enabled.
* Confirm that `delete_from` receives the correct number and content of
filter expressions.
* Validate correct handling of compound predicates (`AND`) without
over-deduplication.
Existing DELETE and UPDATE tests continue to pass, ensuring backward
compatibility when filter pushdown is not used.
---
## Are there any user-facing changes?
Yes — **this is a correctness fix**.
Users implementing custom `TableProvider`s with filter pushdown support will
now receive the expected filter predicates in `delete_from` and `update`. This
restores correct behavior for DELETE and UPDATE statements with WHERE clauses
and prevents accidental full-table modifications.
No API changes are introduced.
---
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]