timsaucer opened a new issue, #1455:
URL: https://github.com/apache/datafusion-python/issues/1455

   ## Summary
   
   Several DataFrame methods from upstream DataFusion v53 are not yet exposed 
in datafusion-python. This issue covers set operations and query-related 
methods.
   
   ## Missing Methods
   
   **Set operations:**
   - [ ] `distinct_on` — deduplicate rows based on specific columns, keeping 
the first row per group
   - [ ] `except_distinct` — set difference with deduplication (complement to 
existing `except_all`)
   - [ ] `intersect_distinct` — set intersection with deduplication (complement 
to existing `intersect`)
   - [ ] `union_by_name` — union two DataFrames matching columns by name rather 
than position
   - [ ] `union_by_name_distinct` — union by name with deduplication
   
   **Query/display:**
   - [ ] `explain_with_options` — explain plan with configurable detail options
   - [ ] `show_limit` — display results with a custom row limit
   - [ ] `sort_by` — sort by column names (simpler API than `sort` which 
requires `Expr`)
   - [ ] `with_param_values` — bind parameter values for prepared statements
   
   ## Upstream Reference
   
   - 
https://docs.rs/datafusion/53.0.0/datafusion/dataframe/struct.DataFrame.html
   
   ## Implementation
   
   - Rust bindings: `crates/core/src/dataframe.rs`
   - Python wrappers: `python/datafusion/dataframe.py`
   
   > **Note:** This gap analysis was performed using an AI agent comparing 
upstream DataFusion v53 documentation against the current datafusion-python 
codebase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to