DDtKey commented on code in PR #5457:
URL: https://github.com/apache/arrow-rs/pull/5457#discussion_r1510483397


##########
parquet/src/arrow/async_writer/mod.rs:
##########
@@ -69,6 +69,29 @@ use tokio::io::{AsyncWrite, AsyncWriteExt};
 /// It is implemented based on the sync writer [`ArrowWriter`] with an inner 
buffer.
 /// The buffered data will be flushed to the writer provided by caller when the
 /// buffer's threshold is exceeded.
+///
+/// ## Memory Limiting
+///
+/// The nature of parquet forces buffering of an entire row group before it 
can be flushed
+/// to the underlying writer. This buffering may exceed the configured buffer 
size
+/// of [`AsyncArrowWriter`]. Memory usage can be limited by prematurely 
flushing the row group,
+/// although this will have implications for file size and query performance. 
See [ArrowWriter]
+/// for more information.

Review Comment:
   Great to have this documented! Thanks!
   
   Should we refer to this in instantiation methods? (`try_new(_with_options)`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to