My thought here is that Parquet as a data format should provide a plugin 
mechanism. With that kind of APIs, users will be able to leverage their own 
optimized implementation. For the implementation based on that set of APIs, it 
should consider the compatibility like fallback mechanism. 

Like accelerators, typically they will have CPU based implementation and buffer 
size sometimes is limited. In that case, fallback is implemented in CPU based 
implementation. 

Thoughts?

Thanks
Cheng Xu

-----Original Message-----
From: Gabor Szadovszky <[email protected]> 
Sent: Wednesday, March 4, 2020 5:59 PM
To: Parquet Dev <[email protected]>
Subject: Re: Provide pluggable APIs to support user customized compression codec

Hi,

My problem with this idea is that I cannot see how we can control that a 
customized codec would compress the data in the specified way so every reader 
that supports the codec can read it. We already have an issue about an 
incompatibility between the java and cpp implementations of the LZ4 compression 
(see https://issues.apache.org/jira/browse/PARQUET-1241 for details).
Meanwhile, there might be several ways to generate a compatible compression so 
it is fair to allow the configuration of the codec just don't know how to 
properly control the output.

Cheers,
Gabor

On Tue, Mar 3, 2020 at 7:00 PM Dong, Xin <[email protected]> wrote:

> Hi,
> In demand of better performance, quite some end users want to leverage 
> accelerators (e.g. FPGA, Intel QAT) to offload compression computation.
> However, in current parquet-mr code, codec implementation can't be 
> customized to leverage accelerators. We would like to proposal a 
> pluggable API to support the customized compression codec.
> I've opened a JIRA https://issues.apache.org/jira/browse/PARQUET-1804 
> for this issue. What's your throughts on this issue?
> Best Regards,
> Xin Dong
>

Reply via email to