Hi, 

I just tested with the Intel QuickAssist Technology, which provide hardware 
accelerate to GZIP, you can see detail here 
https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html
 

Here is the benchmark result run on Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz 
with single thread 

lzbench 1.7.2 (64-bit Linux)   Assembled by P.Skibinski
| Compressor name         | Compression| Decompress.| Compr. size | Ratio | 
Filename |
| memcpy                  |  4942 MB/s |  5688 MB/s |     3263523 |  1.00 | 
calgary/calgary.tar |
| qat 1.0.0                 |  2312 MB/s |  3538 MB/s |     1274379 |  2.56 | 
calgary/calgary.tar |
| snappy 1.1.4          |   283 MB/s  |  1144 MB/s |     1686240 |  1.94 | 
calgary/calgary.tar |
| lz4 1.7.5                  |   453 MB/s  |  2514 MB/s |     1685795 |  1.94 | 
calgary/calgary.tar |
| zstd 1.3.1 -1           |   279 MB/s  |   723 MB/s  |     1187211 |  2.75 | 
calgary/calgary.tar |
| zlib 1.2.11 -1          |    79 MB/s   |   261 MB/s  |     1240838 |  2.63 | 
calgary/calgary.tar |

Thanks,
XieQi
-----Original Message-----
From: Wes McKinney <wesmck...@gmail.com> 
Sent: Thursday, October 22, 2020 9:58 AM
To: dev <dev@arrow.apache.org>
Cc: anto...@python.org; Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin 
<xin.d...@intel.com>; Zhang, Jie1 <jie1.zh...@intel.com>; Xie, Qi 
<qi....@intel.com>
Subject: Re: [Discuss] Provide pluggable APIs to support user customized 
compression codec

Yes, I think he's asking about the motivation for the project. My understanding 
is that Snappy is used more often than Gzip with Parquet

On Wed, Oct 21, 2020 at 8:53 PM Xie, Qi <qi....@intel.com> wrote:
>
> Hi, Antoine
>
> Do you mean the performance data HW-GZIP compared with LZ4/ZSTD?
>
> Thanks,
> XieQi
>
> -----Original Message-----
> From: Antoine Pitrou <anto...@python.org>
> Sent: Tuesday, October 20, 2020 10:38 PM
> To: dev@arrow.apache.org; Xie, Qi <qi....@intel.com>
> Cc: Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin 
> <xin.d...@intel.com>; Zhang, Jie1 <jie1.zh...@intel.com>
> Subject: Re: [Discuss] Provide pluggable APIs to support user 
> customized compression codec
>
>
>
> Le 20/10/2020 à 12:09, Xie, Qi a écrit :
> > Hi, Wes
> >
> > Yes currently the purpose of the key-value metadata is just a hint to 
> > indicate that the parquet file is compressed by plugin so that the parquet 
> > reader can load the plugin library and use plugin to decompress the file.
> > There are many optimized GZIP implementations and may not compatible with 
> > the standard gzip, for example due to hardware limit, the HW-GZIP history 
> > window size maybe smaller than the standard gzip, so that HW-GZIP can't 
> > decompress the file compressed by standard gzip and because we are still 
> > use the Compression::GZIP as Compression::type, we need that metadata to 
> > distinguish it from the standard gzip.
>
> What does it bring over ZSTD or LZ4 exactly?
>
> Regards
>
> Antoine.

Reply via email to