zeroshade commented on code in PR #36163: URL: https://github.com/apache/arrow/pull/36163#discussion_r1234553594
########## go/parquet/pqarrow/file_writer.go: ########## @@ -134,6 +134,13 @@ func (fw *FileWriter) RowGroupTotalBytesWritten() int64 { return 0 } +// WriteBuffered allows to write records and decide where to break your row group +// based on the TotalBytesWritten rather than on the max row group len. +// If using Records, this should be paired with NewBufferedRowGroup, +// while Write will always write a new record as a row group in and of itself. +// +// Performance-wise WriteBuffered might be more favorable than Write +// especially if dealing with lots of records that have only a small amount of rows. Review Comment: Mention that the tradeoff is that more memory will be utilized to keep the whole row group buffered in memory before it starts writing (since Parquet files must write an entire column before writing the next column). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org