wecharyu commented on code in PR #48468:
URL: https://github.com/apache/arrow/pull/48468#discussion_r2741128369


##########
cpp/src/parquet/file_writer.cc:
##########
@@ -68,6 +68,12 @@ int64_t RowGroupWriter::total_compressed_bytes_written() 
const {
   return contents_->total_compressed_bytes_written();
 }
 
+int64_t RowGroupWriter::EstimatedTotalCompressedBytes() const {
+  return contents_->total_compressed_bytes() +
+         contents_->total_compressed_bytes_written() +
+         contents_->EstimatedBufferedValueBytes();

Review Comment:
   @pitrou 3 also will not cause huge over-estimation as long as there are many 
pages. I think the only difference is:
   1 will produce row group that over the `max_row_group_bytes` while 3 produce 
smaller one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to