Hi, All, The existing Parquet compress codec framework only supports codec name based compression implementation lookup. And it's one-2-one mapping which means only one implementation is supported given a codec name. However, there are various implementations for the same codec name. And different implementations may not be compatible with others due to different purposes. Given Gzip as an example, for some accelerators, it's limited in memory capacity and the history buffer size is relatively smaller than CPU based. And currently codec framework doesn't provide a mechanism to allow users to customize standard compression codec for their own purposes (e.g. performance acceleration, workload offloading). To address the problem, we propose a provider-aware compression codec lookup for parquet-mr. We've put the proposal here: https://docs.google.com/document/d/1sbCjDxEjM5UkbMPNmGqEfF-LYPDWhM-B474dZZeOFD4/edit?ts=5ecb2462#heading=h.5b2qz2ba32wm
Any comment is welcome and please let us know your feedback. Thanks, Xin Dong
