Re: [PR] GH-48467: [C++][Parquet] Add configure to limit the row group size in bytes [arrow]

via GitHub Wed, 04 Mar 2026 09:17:33 -0800


wecharyu commented on PR #48468:
URL: https://github.com/apache/arrow/pull/48468#issuecomment-3998954732


   > I still think we should not try to estimate anything here.
   
   @pitrou The first row group seems must depends on the estimated data, 
otherwise the `max_row_group_bytes` could not take effect on it. 
   
   Many other implementations like `parquet-java` and `arrow-rs` use both 
compressed page data and encoded buffered bytes to estimate the remaining rows 
of a row group. Given that the uncompressed buffered bytes are typically a 
small portion of the total footprint, would it be reasonable to rely on a 
similar estimation approach here as well? CC: @wgtmac 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] GH-48467: [C++][Parquet] Add configure to limit the row group size in bytes [arrow]

Reply via email to