Hey everyone, Last month I started a thread about creating a new Apache project which would implement read optimisations for Parquet (eg: readVectored, prefetching and caching the footer) in a single place, so we don’t need to duplicate code for this across cloud providers. This is original email thread: https://lists.apache.org/thread/nbksq32cs8h1ldj8762y6wh9zzp8gqx6
There was some consensus there that the Parquet java would be the best place to start. Before we start looking at code changes, I wanted to briefly outline the motivation and the changes required to the project. I have outlined a more detailed plan in this doc: https://docs.google.com/document/d/1Xdlh23tmCs-KvzHhY2RuwFYmc3xntUKcmwb8yxEl78Y/edit?usp=sharing Would be great to get your thoughts on this. I’ll also join the Parquet sync tomorrow and hope to discuss this there. Thanks, Ahmar
