Francois Saint-Jacques created ARROW-8447: ---------------------------------------------
Summary: [C++][Dataset] Ensure Scanner::ToTable preserve ordering Key: ARROW-8447 URL: https://issues.apache.org/jira/browse/ARROW-8447 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Francois Saint-Jacques This can be refactored with a little effort in Scanner::ToTable: # Change `batches` to `std::vector<RecordBatchVector>` # When pushing the closure to the TaskGroup, also track an incrementing integer, e.g. scan_task_id # In the closure, store the RecordBatches for this ScanTask in a local vector, when all batches are consumed, move the local vector in the `batches` at the right index, resizing and emplacing with mutex # After waiting for the task group completion either * Concatenate into a single vector and call `Table::FromRecordBatch` or * Write a RecordBatchReader that supports vector<vector<RecordBatch> and add method `Table::FromRecordBatchReader` The later involves more work but is the clean way, the other FromRecordBatch method can be implemented from it and support "streaming". -- This message was sent by Atlassian Jira (v8.3.4#803005)