westonpace opened a new pull request, #36012:
URL: https://github.com/apache/arrow/pull/36012

   ### Rationale for this change
   
   In #34280 the default row group size was changed to 1Mi.  However, this was 
accidentally reverted (for python, but not C++) in #34435 
   
   The problem is that there is both an "absolute max row group size for the 
writer" and a "row group size to use for this table"  The pyarrow user is 
unable to set the former property.
   
   The behavior in pyarrow was previously "If no value is given in the call to 
write_table then don't specify anything and let the absolute max apply"
   
   The first fix changed the absolute max to 1Mi.  However, this made it 
impossible for the user to use a larger row group size.  The second fix changed 
the absolute max back to 64Mi.  However, this meant the default didn't change.
   
   ### What changes are included in this PR?
   
   This change leaves the absolute max at 64Mi.  However, if the user does not 
specify a row group size, we no longer "just use the table size" and instead 
use 1Mi.
   
   ### Are these changes tested?
   
   Yes, a unit test was added.
   
   ### Are there any user-facing changes?
   
   Yes, the default row group size now truly changes to 1Mi.  This change was 
already announced as part of #34280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to