wirybeaver opened a new pull request, #22988: URL: https://github.com/apache/datafusion/pull/22988
## Which issue does this PR close? Follow-up to #20763 (merged) which added `MergeIntoOp`, `MergeIntoClause`, and proto types. ## Rationale for this change `MERGE INTO` (SQL:2003) is a widely-used DML statement for upsert/conditional update workloads. This PR wires the types introduced in #20763 through the SQL planner, physical planner, and `TableProvider` trait so that table implementations can actually execute merge operations. ## What changes are included in this PR? **`datafusion/catalog` — `TableProvider` trait extension** - Add `merge_into(source, on, clauses)` async method with a default `not_impl_err` impl so existing providers are unaffected. **`datafusion/sql` — SQL → LogicalPlan** - `statement.rs`: parse `Statement::Merge` into `LogicalPlan::Dml` with `WriteOp::MergeInto`. - Resolve the target table and plan the `USING` source into a `LogicalPlan`. - Build a combined target+source schema to resolve `ON` and `WHEN` expressions. - Convert `ON` condition and `WHEN MATCHED / NOT MATCHED` clauses to DataFusion `Expr`. **`datafusion/expr` — expression plumbing** - `MergeIntoOp::exprs()`: stable iteration over all expressions (ON, then per-clause predicate + action values). - `MergeIntoOp::with_new_exprs()`: rebuild op from a transformed expr vector. - Branch `LogicalPlan::apply_expressions`, `map_expressions`, and `with_new_exprs` on `WriteOp::MergeInto` so optimizers can rewrite merge expressions. Other `WriteOp` variants are unchanged. **`datafusion/core` — physical planner dispatch** - Dispatch `WriteOp::MergeInto` in the physical planner. - Recover the `TableProvider` via `source_as_provider()`, extract the source `ExecutionPlan`, and call `TableProvider::merge_into`. ## Are these changes tested? - The SQL planner path is exercised by `datafusion/proto/tests/cases/roundtrip_logical_plan.rs` (proto round-trip for `MergeInto`). - Unit tests for `MergeIntoOp::exprs` / `with_new_exprs` are included in `dml.rs`. - End-to-end integration tests require a `TableProvider` that implements `merge_into`; that is left to follow-up once a concrete provider (e.g. Delta Lake) adopts the hook. ## Are there any user-facing changes? - **`TableProvider`** gains a new `merge_into` method. The default implementation returns `not_impl_err`, so existing implementations compile without changes. - `MERGE INTO <table> USING <source> ON <cond> WHEN ...` SQL syntax is now accepted by the DataFusion SQL parser and planner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
