It would be good to clarify the exact scope of this. If it is particular to parquet then we should wait for the discussion on dev@parquet to conclude before moving forward. If it is more general to Arrow, then working through scenarios of how this would be used for decompression when the Codec can't support generic input would be useful (the codec library is a singleton across the arrow codebase).
On Mon, Jun 22, 2020 at 4:23 PM Wes McKinney <wesmck...@gmail.com> wrote: > hi XieQi, > > Is the idea that your custom Gzip implementation would automatically > override any places in the codebase where the built-in one would be > used (like the Parquet codebase)? I see some things in the design doc > about serializing the plugin information in the Parquet file metadata > (assuming you want to speed up decompression Parquet data pages) -- is > there a reason to believe that the plugin would be _required_ in order > to read the file? I recall some messages to the Parquet mailing list > about user-defined codecs. > > In general, having a plugin API to provide a means to substitute one > functionally identical for another seems reasonable to me (I could > envision having people customizing kernel execution in the future). We > should try to create a general enough API so that it can be used for > customizations beyond compression codecs so we don't have to go > through a design exercise to support plugin/algorithm overrides for > something else. This is something we could hash out during code review > -- I should have some opinions and I'm sure others will as well > > - Wes > > On Fri, Jun 19, 2020 at 10:21 AM Xie, Qi <qi....@intel.com> wrote: > > > > Hi, > > > > > > In demand of better performance, quite some end users want to leverage > accelerators (e.g. FPGA, Intel QAT) to offload compression. However, in > current Arrow compression framework, it only supports codec name based > compression implementation and can't be customized to leverage > accelerators. For example, for gzip format, we can't call customized codec > to accelerate the compression. We would like to proposal a plugin API to > support the customized compression codec. We've put the proposal here: > > > > > > > > > https://docs.google.com/document/d/1W_TxVRN7WV1wBVOTdbxngzBek1nTolMlJWy6aqC6WG8/edit > > > > > > > > Any comment is welcome and please let us know your feedback. > > > > > > > > Thanks, > > > > XieQi > > > > > > >