DDtKey commented on code in PR #5457:
URL: https://github.com/apache/arrow-rs/pull/5457#discussion_r1510483397
##########
parquet/src/arrow/async_writer/mod.rs:
##########
@@ -69,6 +69,29 @@ use tokio::io::{AsyncWrite, AsyncWriteExt};
/// It is implemented based on the sync writer [`ArrowWriter`] with an inner
buffer.
/// The buffered data will be flushed to the writer provided by caller when the
/// buffer's threshold is exceeded.
+///
+/// ## Memory Limiting
+///
+/// The nature of parquet forces buffering of an entire row group before it
can be flushed
+/// to the underlying writer. This buffering may exceed the configured buffer
size
+/// of [`AsyncArrowWriter`]. Memory usage can be limited by prematurely
flushing the row group,
+/// although this will have implications for file size and query performance.
See [ArrowWriter]
+/// for more information.
Review Comment:
Great to have this documented! Thanks!
Should we refer to this in instantiation methods? (`try_new(_with_options)`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]