gitmodimo commented on code in PR #44470:
URL: https://github.com/apache/arrow/pull/44470#discussion_r1984870192
##########
cpp/src/arrow/dataset/file_base.cc:
##########
@@ -472,9 +472,12 @@ Status FileSystemDataset::Write(const
FileSystemDatasetWriteOptions& write_optio
WriteNodeOptions write_node_options(write_options);
write_node_options.custom_schema = custom_schema;
+ // preserve existing order in dataset by setting implicit_order=true
+ bool implicit_ordering = write_node_options.write_options.preserve_order;
acero::Declaration plan = acero::Declaration::Sequence({
- {"scan", ScanNodeOptions{dataset, scanner->options()}},
+ {"scan", ScanNodeOptions{dataset, scanner->options(),
+ /*require_sequenced_output=*/false,
implicit_ordering}},
Review Comment:
Please note that scan node with require_sequenced_output=false uses
`MakeMergedGenerator`.
https://github.com/apache/arrow/blob/d88ef57a737b246c93e07ff98c19ff488bca1c59/cpp/src/arrow/dataset/scanner.cc#L1026-L1037
As a result the implicit sequence may be constructed on out of sequence
items.
https://github.com/apache/arrow/blob/d88ef57a737b246c93e07ff98c19ff488bca1c59/cpp/src/arrow/util/async_generator.h#L1440-L1457
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]