LantaoJin opened a new pull request, #67: URL: https://github.com/apache/datafusion-java/pull/67
## Which issue does this PR close? - Closes #43 . ## Rationale for this change DataFusion's `DataFrame` exposes eight set-operation methods -- the union/intersect/except family with `*_distinct` and `*_by_name` variants -- and none have been reachable from Java. Without these, callers fall back to `UNION`/`INTERSECT`/`EXCEPT` via SQL, which loses lazy DataFrame composition and forces both sides to be registered as tables. This PR exposes all eight additively. ## What changes are included in this PR? Eight new methods on `DataFrame`, each taking another `DataFrame`: - `union(other)` -- SQL `UNION ALL` (positional, keeps duplicates) - `unionDistinct(other)` -- SQL `UNION` (positional, deduplicated) - `unionByName(other)` -- by column name, keeps duplicates; missing columns become NULL - `unionByNameDistinct(other)` -- by column name, deduplicated; missing columns become NULL - `intersect(other)` -- SQL `INTERSECT ALL` - `intersectDistinct(other)` -- SQL `INTERSECT` - `except(other)` -- SQL `EXCEPT ALL` - `exceptDistinct(other)` -- SQL `EXCEPT` ## Are these changes tested? Yes -- 12 new tests in `DataFrameTransformationsTest`. ## Are there any user-facing changes? Yes -- purely additive. Eight new methods on `DataFrame`. No API removals, no deprecations, no behaviour change for existing callers. No Cargo feature changes; binary size is unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
