alamb commented on a change in pull request #1214:
URL: https://github.com/apache/arrow-rs/pull/1214#discussion_r790278532



##########
File path: parquet/src/arrow/arrow_writer.rs
##########
@@ -40,14 +43,23 @@ use crate::{
 
 /// Arrow writer
 ///
-/// Writes Arrow `RecordBatch`es to a Parquet writer
+/// Writes Arrow `RecordBatch`es to a Parquet writer, buffering up 
`RecordBatch` in order
+/// to produce row groups with `max_row_group_size` rows. Any remaining rows 
will be
+/// flushed on close, leading the final row group in the output file to 
potentially
+/// contain fewer than `max_row_group_size` rows
 pub struct ArrowWriter<W: ParquetWriter> {
     /// Underlying Parquet writer
     writer: SerializedFileWriter<W>,
+
+    buffer: Vec<VecDeque<ArrayRef>>,

Review comment:
       I think it would be worth documenting these new fields




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to