I recommend trying different values using the parquet-cli. That's an easy way to see how different row group and page sizes perform. That's what I do to tune all of our tables.
rb On Fri, Jan 12, 2018 at 10:43 AM, ALeX Wang <[email protected]> wrote: > Hi, > > I'm using parquet to store a big table (400+ columns), and most of columns > will be none > > Is there any recommended rowgroup size and the number of row groups per > parquet file for my use case? Or is there any reference/paper that I could > read myself, > > > Thanks, > -- > Alex Wang, > Open vSwitch developer > -- Ryan Blue Software Engineer Netflix
