Will include Zstd as well, thank you. However, we are interested in compression speed rather than ratio too.
On Thu, Jul 1, 2021 at 2:01 PM Ryan Blue <[email protected]> wrote: > You should probably try Zstd while you're at it. We had great results with > Zstd as well. My conclusion was that Zstd is probably the right choice when > you want higher compression ratios and LZ4 was the right choice when you > didn't need great compression but wanted fast compression and decompression > speeds. Zstd pretty much replaces gzip and LZ4 replaces snappy. > > On Thu, Jul 1, 2021 at 1:59 PM Sreeram Garlapati <[email protected]> > wrote: > >> Slick, thanks @Ryan Blue <[email protected]>. We will add LZ4 to our mix >> and report back if we find anything different. >> >> On Thu, Jul 1, 2021 at 1:50 PM Ryan Blue <[email protected]> wrote: >> >>> The default should probably be LZ4. In our testing, LZ4 beat snappy for >>> every dataset for read time, write time, and compression ratio. I believe >>> it also typically got a better compression ratio than gzip. Gzip was the >>> previous default because it does a better job on compression ratio than >>> snappy. >>> >>> Ryan >>> >>> On Thu, Jul 1, 2021 at 1:48 PM Sreeram Garlapati < >>> [email protected]> wrote: >>> >>>> Hello Iceberg devs! >>>> >>>> Do any of you folks use the underlying file format as* Parquet + >>>> Snappy.* >>>> Iceberg configures this by default as Parquet + gzip ( >>>> *write.parquet.compression-codec*). >>>> *Is there any specific reason for this Choice?* >>>> >>>> In our preliminary tests we found better numbers with *Parquet + >>>> Snappy* than with *gzip*. >>>> Operation = compress and write to local disk >>>> File Size = 524.3MB (about the same with both the compression codecs) >>>> row group size = 64mb. >>>> >>>> gzip snappy >>>> 8.304 >>>> 5.478 >>>> >>>> >>>> We are still in the process of our full benchmarking (for reads) - but, >>>> want to understand - if there is a whole different angle to this that we >>>> are not thinking thru. >>>> >>>> Truly appreciate any inputs, >>>> Sreeram >>>> >>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> > > -- > Ryan Blue > Tabular >
