[ 
https://issues.apache.org/jira/browse/ARROW-13542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17393178#comment-17393178
 ] 

Ben Kietzman commented on ARROW-13542:
--------------------------------------

Currently I was thinking that partitioning would be handled within this node, 
since that'd be the most straightforward extraction of a node from 
FileSystemDataset::Write.

If you wanted to extract a compute::PartitionNode instead, that'd probably be 
useful later on. I think PartitionNode would:
- use a Grouper for id-ing their destination partition
- sort batches by their partition id
- emit slices of input batches with equal partition id
  - the partition expression is stored in ExecBatch::guarantee
(note: does not utilize a dataset::Partitioning)

Then WriteNode would only use a Partitioning to format ExecBatch::guarantees to 
an output directory. I think this approach would allow us to delete 
Partitioning::Partition too, since that behavior would now be encapsulated by 
PartitionNode.

Also note that whatever approach you take is going to impinge on ARROW-13338 
since ExecPlans don't support sync scanning and FileSystemDataset::Write 
depends on [[deprecated]] Scanner::Scan

> [C++][Compute][Dataset] Add dataset::WriteNode for writing rows from an 
> ExecPlan to disk
> ----------------------------------------------------------------------------------------
>
>                 Key: ARROW-13542
>                 URL: https://issues.apache.org/jira/browse/ARROW-13542
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Ben Kietzman
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: dataset
>
> This will serve as a sink ExecNode which dumps all the batches it receives to 
> disk. The PR should probably also replace {{FileSystemDataset::Write}} with 
> an ExecPlan based implementation



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to