Hi Xin,

Thanks for the proposal. Could you please make the google doc public?

Cheers,
Walid

On Thu, Jun 4, 2020, 6:46 AM Dong, Xin <[email protected]> wrote:

> Hi, All,
>
> The existing Parquet compress codec framework only supports codec name
> based compression implementation lookup. And it's one-2-one mapping which
> means only one implementation is supported given a codec name.
> However, there are various implementations for the same codec name. And
> different implementations may not be compatible with others due to
> different purposes. Given Gzip as an example, for some accelerators, it's
> limited in memory capacity and the history buffer size is relatively
> smaller than CPU based.  And currently codec framework doesn't provide a
> mechanism to allow users to customize standard compression codec for their
> own purposes (e.g. performance acceleration, workload offloading).
> To address the problem, we propose a provider-aware compression codec
> lookup for parquet-mr. We've put the proposal here:
>
> https://docs.google.com/document/d/1sbCjDxEjM5UkbMPNmGqEfF-LYPDWhM-B474dZZeOFD4/edit?ts=5ecb2462#heading=h.5b2qz2ba32wm
>
> Any comment is welcome and please let us know your feedback.
>
> Thanks,
> Xin Dong
>

Reply via email to