Re: Proposal for the plugin API to support user customized compression codec

Micah Kornfield Thu, 25 Jun 2020 00:17:11 -0700

Hi Cheng Xu,

> Since Arrow is more in memory format mostly for intermediate data, I would
> expect less consideration in backward compatibility different from on-disk
> Parquet format.


1.  The Arrow file format is not ephemeral and now supports compressed
buffers.
2.  Even with  other parts of Arrow being ephemeral, the compression
libraries are used as components in a generic IO subsystem
(see arrow/io/compressed.h in the codebase).  It would be good to work
through the implications of this.

Thanks,
Micah

On Thu, Jun 25, 2020 at 12:07 AM Xu, Cheng A <cheng.a...@intel.com> wrote:

> Thanks Micha and Wes for the reply. W.R.T the scope, we’re working in
> progress together with Parquet community to refine our proposal.
> https://www.mail-archive.com/dev@parquet.apache.org/msg12463.html
>
>
>
> This proposal here is more general to Arrow (indeed it can be used by
> native Parquet as well). Since Arrow is more in memory format mostly for
> intermediate data, I would expect less consideration in backward
> compatibility different from on-disk Parquet format. Considering this, we
> can discuss those two things separately. For Parquet part, it should be
> consistent behavior as Java Parquet. For Arrow part, it should also be
> compatible with new extendable Parquet compression codec framework. And we
> can start with Parquet part first.
>
>
>
> Thanks
>
> Cheng Xu
>
>
>
> *From:* Micah Kornfield <emkornfi...@gmail.com>
> *Sent:* Tuesday, June 23, 2020 12:11 PM
> *To:* dev <dev@arrow.apache.org>
> *Cc:* Xu, Cheng A <cheng.a...@intel.com>; Xie, Qi <qi....@intel.com>
> *Subject:* Re: Proposal for the plugin API to support user customized
> compression codec
>
>
>
> It would be good to clarify the exact scope of this.  If it is
> particular to parquet then we should wait for the discussion on dev@parquet
> to conclude before moving forward.  If it is more general to Arrow, then
> working through scenarios of how this would be used for decompression when
> the Codec can't support generic input would be useful (the codec library is
> a singleton across the arrow codebase).
>
>
>
> On Mon, Jun 22, 2020 at 4:23 PM Wes McKinney <wesmck...@gmail.com> wrote:
>
> hi XieQi,
>
> Is the idea that your custom Gzip implementation would automatically
> override any places in the codebase where the built-in one would be
> used (like the Parquet codebase)? I see some things in the design doc
> about serializing the plugin information in the Parquet file metadata
> (assuming you want to speed up decompression Parquet data pages) -- is
> there a reason to believe that the plugin would be _required_ in order
> to read the file? I recall some messages to the Parquet mailing list
> about user-defined codecs.
>
> In general, having a plugin API to provide a means to substitute one
> functionally identical for another seems reasonable to me (I could
> envision having people customizing kernel execution in the future). We
> should try to create a general enough API so that it can be used for
> customizations beyond compression codecs so we don't have to go
> through a design exercise to support plugin/algorithm overrides for
> something else. This is something we could hash out during code review
> -- I should have some opinions and I'm sure others will as well
>
> - Wes
>
> On Fri, Jun 19, 2020 at 10:21 AM Xie, Qi <qi....@intel.com> wrote:
> >
> > Hi,
> >
> >
> > In demand of better performance, quite some end users want to leverage
> accelerators (e.g. FPGA, Intel QAT) to offload compression. However, in
> current Arrow compression framework, it only supports codec name based
> compression implementation and can't be customized to leverage
> accelerators. For example, for gzip format, we can't call customized codec
> to accelerate the compression. We would like to proposal a plugin API to
> support the customized compression codec. We've put the proposal here:
> >
> >
> >
> >
> https://docs.google.com/document/d/1W_TxVRN7WV1wBVOTdbxngzBek1nTolMlJWy6aqC6WG8/edit
> >
> >
> >
> > Any comment is welcome and please let us know your feedback.
> >
> >
> >
> > Thanks,
> >
> > XieQi
> >
> >
> >
>
>

Re: Proposal for the plugin API to support user customized compression codec

Reply via email to