What is the performance of, say, HW GZip against SW ZSTD?
Regards Antoine. On Thu, 25 Jun 2020 07:06:58 +0000 "Xu, Cheng A" <cheng.a...@intel.com> wrote: > Thanks Micha and Wes for the reply. W.R.T the scope, we’re working in > progress together with Parquet community to refine our proposal. > https://www.mail-archive.com/dev@parquet.apache.org/msg12463.html > > This proposal here is more general to Arrow (indeed it can be used by native > Parquet as well). Since Arrow is more in memory format mostly for > intermediate data, I would expect less consideration in backward > compatibility different from on-disk Parquet format. Considering this, we can > discuss those two things separately. For Parquet part, it should be > consistent behavior as Java Parquet. For Arrow part, it should also be > compatible with new extendable Parquet compression codec framework. And we > can start with Parquet part first. > > Thanks > Cheng Xu > > From: Micah Kornfield <emkornfi...@gmail.com> > Sent: Tuesday, June 23, 2020 12:11 PM > To: dev <dev@arrow.apache.org> > Cc: Xu, Cheng A <cheng.a...@intel.com>; Xie, Qi <qi....@intel.com> > Subject: Re: Proposal for the plugin API to support user customized > compression codec > > It would be good to clarify the exact scope of this. If it is particular to > parquet then we should wait for the discussion on dev@parquet to conclude > before moving forward. If it is more general to Arrow, then working through > scenarios of how this would be used for decompression when the Codec can't > support generic input would be useful (the codec library is a singleton > across the arrow codebase). > > On Mon, Jun 22, 2020 at 4:23 PM Wes McKinney > <wesmck...@gmail.com<mailto:wesmck...@gmail.com>> wrote: > hi XieQi, > > Is the idea that your custom Gzip implementation would automatically > override any places in the codebase where the built-in one would be > used (like the Parquet codebase)? I see some things in the design doc > about serializing the plugin information in the Parquet file metadata > (assuming you want to speed up decompression Parquet data pages) -- is > there a reason to believe that the plugin would be _required_ in order > to read the file? I recall some messages to the Parquet mailing list > about user-defined codecs. > > In general, having a plugin API to provide a means to substitute one > functionally identical for another seems reasonable to me (I could > envision having people customizing kernel execution in the future). We > should try to create a general enough API so that it can be used for > customizations beyond compression codecs so we don't have to go > through a design exercise to support plugin/algorithm overrides for > something else. This is something we could hash out during code review > -- I should have some opinions and I'm sure others will as well > > - Wes > > On Fri, Jun 19, 2020 at 10:21 AM Xie, Qi > <qi....@intel.com<mailto:qi....@intel.com>> wrote: > > > > Hi, > > > > > > In demand of better performance, quite some end users want to leverage > > accelerators (e.g. FPGA, Intel QAT) to offload compression. However, in > > current Arrow compression framework, it only supports codec name based > > compression implementation and can't be customized to leverage > > accelerators. For example, for gzip format, we can't call customized codec > > to accelerate the compression. We would like to proposal a plugin API to > > support the customized compression codec. We've put the proposal here: > > > > > > > > https://docs.google.com/document/d/1W_TxVRN7WV1wBVOTdbxngzBek1nTolMlJWy6aqC6WG8/edit > > > > > > > > Any comment is welcome and please let us know your feedback. > > > > > > > > Thanks, > > > > XieQi > > > > > >