This doesn't address the large number of row groups ticket that was raised, but for some visibility: there is some work to change the row group sizing based on the size of data instead of a static number of rows [1] as well as exposing a few more knobs to tune [2]
There is a bit of prior art in the R implementation for attempting to get a reasonable row group size based on the shape of the data (basically, aims to have row groups that have 250 Million cells in them). [3] [1] https://issues.apache.org/jira/browse/ARROW-4542 [2] https://issues.apache.org/jira/browse/ARROW-14426 and https://issues.apache.org/jira/browse/ARROW-14427 [3] https://github.com/apache/arrow/blob/641554b0bcce587549bfcfd0cde3cb4bc23054aa/r/R/parquet.R#L204-L222 -Jon On Wed, Nov 17, 2021 at 4:35 AM Joris Van den Bossche <jorisvandenboss...@gmail.com> wrote: > > In addition, would it be useful to be able to change this max_row_group_length > from Python? > Currently that writer property can't be changed from Python, you can only > specify the row_group_size (chunk_size in C++) > when writing a table, but that's currently only useful to set it to > something that is smaller than the max_row_group_length. > > Joris