wgtmac commented on PR #48468:
URL: https://github.com/apache/arrow/pull/48468#issuecomment-4047682751

   Our internal implementation actually does not count the buffered size for 
row group size estimation and sometimes the estimation is significantly 
imprecise (always smaller than the real size). But adding the buffered size 
could make it more complicated to predict, especially for wide columns. So I 
think we can split this PR into two: one for adding the APIs which are good for 
all cases, and the other one for the size estimation algorithm which is still 
in debate. WDYT? @wecharyu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to