timsaucer opened a new issue, #1455: URL: https://github.com/apache/datafusion-python/issues/1455
## Summary Several DataFrame methods from upstream DataFusion v53 are not yet exposed in datafusion-python. This issue covers set operations and query-related methods. ## Missing Methods **Set operations:** - [ ] `distinct_on` — deduplicate rows based on specific columns, keeping the first row per group - [ ] `except_distinct` — set difference with deduplication (complement to existing `except_all`) - [ ] `intersect_distinct` — set intersection with deduplication (complement to existing `intersect`) - [ ] `union_by_name` — union two DataFrames matching columns by name rather than position - [ ] `union_by_name_distinct` — union by name with deduplication **Query/display:** - [ ] `explain_with_options` — explain plan with configurable detail options - [ ] `show_limit` — display results with a custom row limit - [ ] `sort_by` — sort by column names (simpler API than `sort` which requires `Expr`) - [ ] `with_param_values` — bind parameter values for prepared statements ## Upstream Reference - https://docs.rs/datafusion/53.0.0/datafusion/dataframe/struct.DataFrame.html ## Implementation - Rust bindings: `crates/core/src/dataframe.rs` - Python wrappers: `python/datafusion/dataframe.py` > **Note:** This gap analysis was performed using an AI agent comparing upstream DataFusion v53 documentation against the current datafusion-python codebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
