Hi folks, any suggestions on this? Thanks Cheng Xu
-----Original Message----- From: Dong, Xin <[email protected]> Sent: Friday, June 5, 2020 2:19 PM To: [email protected] Subject: RE: Proposal for CompressionCodec Provider-aware Compression Codec Lookup for parquet-mr Hi, Walid, We've moved the doc here for public access: https://docs.google.com/document/d/1ueSYq2FIzaom23cpHXppig93ylOxe8CU6EwS82dov2E/ Thanks, Xin Dong -----Original Message----- From: Gara Walid <[email protected]> Sent: Thursday, June 4, 2020 2:14 PM To: [email protected] Subject: Re: Proposal for CompressionCodec Provider-aware Compression Codec Lookup for parquet-mr Hi Xin, Thanks for the proposal. Could you please make the google doc public? Cheers, Walid On Thu, Jun 4, 2020, 6:46 AM Dong, Xin <[email protected]> wrote: > Hi, All, > > The existing Parquet compress codec framework only supports codec name > based compression implementation lookup. And it's one-2-one mapping > which means only one implementation is supported given a codec name. > However, there are various implementations for the same codec name. > And different implementations may not be compatible with others due to > different purposes. Given Gzip as an example, for some accelerators, > it's limited in memory capacity and the history buffer size is > relatively smaller than CPU based. And currently codec framework > doesn't provide a mechanism to allow users to customize standard > compression codec for their own purposes (e.g. performance acceleration, > workload offloading). > To address the problem, we propose a provider-aware compression codec > lookup for parquet-mr. We've put the proposal here: > > https://docs.google.com/document/d/1sbCjDxEjM5UkbMPNmGqEfF-LYPDWhM-B47 > 4dZZeOFD4/edit?ts=5ecb2462#heading=h.5b2qz2ba32wm > > Any comment is welcome and please let us know your feedback. > > Thanks, > Xin Dong >
