[ 
https://issues.apache.org/jira/browse/ARROW-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427748#comment-17427748
 ] 

Alessandro Molina edited comment on ARROW-14293 at 10/12/21, 3:07 PM:
----------------------------------------------------------------------

We might expose an iterator as the result of {{Dataset.join}},  [~westonpace] 
[~leedm777] do you have any suggestion on how we can best tackle this?

--- Note from Antoine ---
Regarding the join ExecPlan in PyArrow, it seems like the rough idea should be 
to have a simple "sink" node (see SinkNodeOptions) and then wrap the output 
generator using MakeGeneratorReader to get a regular RecordBatchReader than can 
be turned into a Python iterator.


was (Author: amol-):
We might expose an iterator as the result of `Dataset.join`,  [~westonpace] 
[~leedm777] do you have any suggestion on how we can best tackle this?

> [Python] Basic Join functionality in PyArrow
> --------------------------------------------
>
>                 Key: ARROW-14293
>                 URL: https://issues.apache.org/jira/browse/ARROW-14293
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Alessandro Molina
>            Priority: Major
>             Fix For: 7.0.0
>
>
> We want to expose a {{Table.join}} and {{Dataset.join}} functionalities in 
> PyArrow which can leverage our join feature from the ExecPlan to expose.
> The {{Table.join}} can easily return a new {{Table}}, questions about what 
> {{Dataset.join}} might return are more complex as it probably doesn't make 
> much sense to return a new {{Dataset}} given that the result won't map to any 
> files on disk



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to