Hi, My problem with this idea is that I cannot see how we can control that a customized codec would compress the data in the specified way so every reader that supports the codec can read it. We already have an issue about an incompatibility between the java and cpp implementations of the LZ4 compression (see https://issues.apache.org/jira/browse/PARQUET-1241 for details). Meanwhile, there might be several ways to generate a compatible compression so it is fair to allow the configuration of the codec just don't know how to properly control the output.
Cheers, Gabor On Tue, Mar 3, 2020 at 7:00 PM Dong, Xin <[email protected]> wrote: > Hi, > In demand of better performance, quite some end users want to leverage > accelerators (e.g. FPGA, Intel QAT) to offload compression computation. > However, in current parquet-mr code, codec implementation can't be > customized to leverage accelerators. We would like to proposal a pluggable > API to support the customized compression codec. > I've opened a JIRA https://issues.apache.org/jira/browse/PARQUET-1804 for > this issue. What's your throughts on this issue? > Best Regards, > Xin Dong >
