EnricoMi commented on code in PR #44470:
URL: https://github.com/apache/arrow/pull/44470#discussion_r1984967250
##########
cpp/src/arrow/dataset/file_base.cc:
##########
@@ -472,9 +472,12 @@ Status FileSystemDataset::Write(const
FileSystemDatasetWriteOptions& write_optio
WriteNodeOptions write_node_options(write_options);
write_node_options.custom_schema = custom_schema;
+ // preserve existing order in dataset by setting implicit_order=true
+ bool implicit_ordering = write_node_options.write_options.preserve_order;
acero::Declaration plan = acero::Declaration::Sequence({
- {"scan", ScanNodeOptions{dataset, scanner->options()}},
+ {"scan", ScanNodeOptions{dataset, scanner->options(),
+ /*require_sequenced_output=*/false,
implicit_ordering}},
Review Comment:
> As a result the implicit sequence may be constructed on out of sequence
items.
That is fine and expected. With `implicit_ordering=true`, the batches get
assigned batch indices, which are used downstream to sequence them again.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]