Will include Zstd as well, thank you.
However, we are interested in compression speed rather than ratio too.

On Thu, Jul 1, 2021 at 2:01 PM Ryan Blue <[email protected]> wrote:

> You should probably try Zstd while you're at it. We had great results with
> Zstd as well. My conclusion was that Zstd is probably the right choice when
> you want higher compression ratios and LZ4 was the right choice when you
> didn't need great compression but wanted fast compression and decompression
> speeds. Zstd pretty much replaces gzip and LZ4 replaces snappy.
>
> On Thu, Jul 1, 2021 at 1:59 PM Sreeram Garlapati <[email protected]>
> wrote:
>
>> Slick, thanks @Ryan Blue <[email protected]>. We will add LZ4 to our mix
>> and report back if we find anything different.
>>
>> On Thu, Jul 1, 2021 at 1:50 PM Ryan Blue <[email protected]> wrote:
>>
>>> The default should probably be LZ4. In our testing, LZ4 beat snappy for
>>> every dataset for read time, write time, and compression ratio. I believe
>>> it also typically got a better compression ratio than gzip. Gzip was the
>>> previous default because it does a better job on compression ratio than
>>> snappy.
>>>
>>> Ryan
>>>
>>> On Thu, Jul 1, 2021 at 1:48 PM Sreeram Garlapati <
>>> [email protected]> wrote:
>>>
>>>> Hello Iceberg devs!
>>>>
>>>> Do any of you folks use the underlying file format as* Parquet +
>>>> Snappy.*
>>>> Iceberg configures this by default as Parquet + gzip (
>>>> *write.parquet.compression-codec*).
>>>> *Is there any specific reason for this Choice?*
>>>>
>>>> In our preliminary tests we found better numbers with *Parquet +
>>>> Snappy* than with *gzip*.
>>>> Operation = compress and write to local disk
>>>> File Size = 524.3MB (about the same with both the compression codecs)
>>>> row group size = 64mb.
>>>>
>>>> gzip snappy
>>>> 8.304
>>>> 5.478
>>>>
>>>>
>>>> We are still in the process of our full benchmarking (for reads) - but,
>>>> want to understand - if there is a whole different angle to this that we
>>>> are not thinking thru.
>>>>
>>>> Truly appreciate any inputs,
>>>> Sreeram
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Tabular
>>>
>>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to