westonpace commented on issue #33710: URL: https://github.com/apache/arrow/issues/33710#issuecomment-1414497267
> In short, WriteRecordBatch is subject to max number of rows allowed in a row group. So it may slice the input record batch and write the sliced batches into different row groups in order. I'm personally not too worried about that feature as a lot of that behavior can be obtained with the [dataset writer](https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/dataset_writer.cc) which has `max_rows_per_file` and `max_rows_per_group` and is independent of format. It handles multiple parallel writes across multiple files. > CMIW, we don't have the utility to support this yet. We have decent utilities for working with async tasks. For example, you could use a [throttled async task scheduler](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_util.h) if you want to execute them in order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
