tustvold commented on issue #1269:
URL: https://github.com/apache/arrow-rs/issues/1269#issuecomment-1469662590

   > because the polling frequency can't be determined easily
   
   Checking after each call to 
[write](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html#method.write)
 should be fine. A more sophisticated writer could track the number of written 
rows and only check once they exceed the max row group size, in practice this 
is highly unlikely to be make a tangible performance difference
   
   > may lead to high memory usage by the inner writer Vec<u8>
   
   Assuming a single RecordBatch does not exceed the maximum size of a row 
group, the above approach should be optimal in this respect.
   
    > guess the buffer logic can still be kept in the async writer
   
   If we can share as much between the sync and async implementations as 
possible, that would be beneficial. The separate async read path, whilst a 
necessary evil, is a non-trivial maintenance burden


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to