[ 
https://issues.apache.org/jira/browse/ARROW-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427883#comment-17427883
 ] 

David Li commented on ARROW-14293:
----------------------------------

Dataset.join returning an iterator makes sense to me.

Backing up though, is there a higher level plan for what sorts of functionality 
we're trying to expose? Are we targeting a subset of Pandas, perhaps? Obviously 
full Pandas compatibility is not feasible or necessarily desirable, but it 
might be worth considering the API as a whole before building out the parts. 
(Apologies if this is already considered somewhere and this ticket is merely 
the result of that.)

I agree with Weston's point since then I think natural questions might include 
things like, we can do a filter and then a join, but how do we filter after a 
join? (Collect into a table, then treat as a Dataset? This gets awkward/verbose 
fast)

> [Python] Basic Join functionality in PyArrow
> --------------------------------------------
>
>                 Key: ARROW-14293
>                 URL: https://issues.apache.org/jira/browse/ARROW-14293
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Alessandro Molina
>            Priority: Major
>             Fix For: 7.0.0
>
>
> We want to expose a {{Table.join}} and {{Dataset.join}} functionalities in 
> PyArrow which can leverage our join feature from the ExecPlan to expose.
> The {{Table.join}} can easily return a new {{Table}}, questions about what 
> {{Dataset.join}} might return are more complex as it probably doesn't make 
> much sense to return a new {{Dataset}} given that the result won't map to any 
> files on disk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to