This doesn't address the large number of row groups ticket that was
raised, but for some visibility: there is some work to change the row
group sizing based on the size of data instead of a static number of
rows [1] as well as exposing a few more knobs to tune [2]

There is a bit of prior art in the R implementation for attempting to
get a reasonable row group size based on the shape of the data
(basically, aims to have row groups that have 250 Million cells in
them). [3]

[1] https://issues.apache.org/jira/browse/ARROW-4542
[2] https://issues.apache.org/jira/browse/ARROW-14426 and
https://issues.apache.org/jira/browse/ARROW-14427
[3] 
https://github.com/apache/arrow/blob/641554b0bcce587549bfcfd0cde3cb4bc23054aa/r/R/parquet.R#L204-L222

-Jon

On Wed, Nov 17, 2021 at 4:35 AM Joris Van den Bossche
<jorisvandenboss...@gmail.com> wrote:
>
> In addition, would it be useful to be able to change this max_row_group_length
> from Python?
> Currently that writer property can't be changed from Python, you can only
> specify the row_group_size (chunk_size in C++)
> when writing a table, but that's currently only useful to set it to
> something that is smaller than the max_row_group_length.
>
> Joris

Reply via email to