westonpace opened a new pull request, #36012: URL: https://github.com/apache/arrow/pull/36012
### Rationale for this change In #34280 the default row group size was changed to 1Mi. However, this was accidentally reverted (for python, but not C++) in #34435 The problem is that there is both an "absolute max row group size for the writer" and a "row group size to use for this table" The pyarrow user is unable to set the former property. The behavior in pyarrow was previously "If no value is given in the call to write_table then don't specify anything and let the absolute max apply" The first fix changed the absolute max to 1Mi. However, this made it impossible for the user to use a larger row group size. The second fix changed the absolute max back to 64Mi. However, this meant the default didn't change. ### What changes are included in this PR? This change leaves the absolute max at 64Mi. However, if the user does not specify a row group size, we no longer "just use the table size" and instead use 1Mi. ### Are these changes tested? Yes, a unit test was added. ### Are there any user-facing changes? Yes, the default row group size now truly changes to 1Mi. This change was already announced as part of #34280 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
