westonpace commented on issue #33710:
URL: https://github.com/apache/arrow/issues/33710#issuecomment-1414497267

   > In short, WriteRecordBatch is subject to max number of rows allowed in a 
row group. So it may slice the input record batch and write the sliced batches 
into different row groups in order.
   
   I'm personally not too worried about that feature as a lot of that behavior 
can be obtained with the [dataset 
writer](https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/dataset_writer.cc)
 which has `max_rows_per_file` and `max_rows_per_group` and is independent of 
format.  It handles multiple parallel writes across multiple files.
   
   > CMIW, we don't have the utility to support this yet.
   
   We have decent utilities for working with async tasks.  For example, you 
could use a [throttled async task 
scheduler](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_util.h)
 if you want to execute them in order.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to