tustvold commented on issue #1269: URL: https://github.com/apache/arrow-rs/issues/1269#issuecomment-1469662590
> because the polling frequency can't be determined easily Checking after each call to [write](https://docs.rs/parquet/latest/parquet/arrow/arrow_writer/struct.ArrowWriter.html#method.write) should be fine. A more sophisticated writer could track the number of written rows and only check once they exceed the max row group size, in practice this is highly unlikely to be make a tangible performance difference > may lead to high memory usage by the inner writer Vec<u8> Assuming a single RecordBatch does not exceed the maximum size of a row group, the above approach should be optimal in this respect. > guess the buffer logic can still be kept in the async writer If we can share as much between the sync and async implementations as possible, that would be beneficial. The separate async read path, whilst a necessary evil, is a non-trivial maintenance burden -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
