Hi, I just tested with the Intel QuickAssist Technology, which provide hardware accelerate to GZIP, you can see detail here https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html
Here is the benchmark result run on Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with single thread lzbench 1.7.2 (64-bit Linux) Assembled by P.Skibinski | Compressor name | Compression| Decompress.| Compr. size | Ratio | Filename | | memcpy | 4942 MB/s | 5688 MB/s | 3263523 | 1.00 | calgary/calgary.tar | | qat 1.0.0 | 2312 MB/s | 3538 MB/s | 1274379 | 2.56 | calgary/calgary.tar | | snappy 1.1.4 | 283 MB/s | 1144 MB/s | 1686240 | 1.94 | calgary/calgary.tar | | lz4 1.7.5 | 453 MB/s | 2514 MB/s | 1685795 | 1.94 | calgary/calgary.tar | | zstd 1.3.1 -1 | 279 MB/s | 723 MB/s | 1187211 | 2.75 | calgary/calgary.tar | | zlib 1.2.11 -1 | 79 MB/s | 261 MB/s | 1240838 | 2.63 | calgary/calgary.tar | Thanks, XieQi -----Original Message----- From: Wes McKinney <wesmck...@gmail.com> Sent: Thursday, October 22, 2020 9:58 AM To: dev <dev@arrow.apache.org> Cc: anto...@python.org; Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin <xin.d...@intel.com>; Zhang, Jie1 <jie1.zh...@intel.com>; Xie, Qi <qi....@intel.com> Subject: Re: [Discuss] Provide pluggable APIs to support user customized compression codec Yes, I think he's asking about the motivation for the project. My understanding is that Snappy is used more often than Gzip with Parquet On Wed, Oct 21, 2020 at 8:53 PM Xie, Qi <qi....@intel.com> wrote: > > Hi, Antoine > > Do you mean the performance data HW-GZIP compared with LZ4/ZSTD? > > Thanks, > XieQi > > -----Original Message----- > From: Antoine Pitrou <anto...@python.org> > Sent: Tuesday, October 20, 2020 10:38 PM > To: dev@arrow.apache.org; Xie, Qi <qi....@intel.com> > Cc: Xu, Cheng A <cheng.a...@intel.com>; Dong, Xin > <xin.d...@intel.com>; Zhang, Jie1 <jie1.zh...@intel.com> > Subject: Re: [Discuss] Provide pluggable APIs to support user > customized compression codec > > > > Le 20/10/2020 à 12:09, Xie, Qi a écrit : > > Hi, Wes > > > > Yes currently the purpose of the key-value metadata is just a hint to > > indicate that the parquet file is compressed by plugin so that the parquet > > reader can load the plugin library and use plugin to decompress the file. > > There are many optimized GZIP implementations and may not compatible with > > the standard gzip, for example due to hardware limit, the HW-GZIP history > > window size maybe smaller than the standard gzip, so that HW-GZIP can't > > decompress the file compressed by standard gzip and because we are still > > use the Compression::GZIP as Compression::type, we need that metadata to > > distinguish it from the standard gzip. > > What does it bring over ZSTD or LZ4 exactly? > > Regards > > Antoine.