Hi Xin, Thanks for the proposal. Could you please make the google doc public?
Cheers, Walid On Thu, Jun 4, 2020, 6:46 AM Dong, Xin <[email protected]> wrote: > Hi, All, > > The existing Parquet compress codec framework only supports codec name > based compression implementation lookup. And it's one-2-one mapping which > means only one implementation is supported given a codec name. > However, there are various implementations for the same codec name. And > different implementations may not be compatible with others due to > different purposes. Given Gzip as an example, for some accelerators, it's > limited in memory capacity and the history buffer size is relatively > smaller than CPU based. And currently codec framework doesn't provide a > mechanism to allow users to customize standard compression codec for their > own purposes (e.g. performance acceleration, workload offloading). > To address the problem, we propose a provider-aware compression codec > lookup for parquet-mr. We've put the proposal here: > > https://docs.google.com/document/d/1sbCjDxEjM5UkbMPNmGqEfF-LYPDWhM-B474dZZeOFD4/edit?ts=5ecb2462#heading=h.5b2qz2ba32wm > > Any comment is welcome and please let us know your feedback. > > Thanks, > Xin Dong >
