[DISCUSS] Implementing Parquet read optimisations

Suhail, Ahmar Tue, 06 Jan 2026 06:16:40 -0800

Hey everyone,

Last month I started a thread about creating a new Apache project which would 
implement read optimisations for Parquet (eg: readVectored, prefetching and 
caching the footer) in a single place, so we don’t need to duplicate code for 
this across cloud providers. This is original email thread: 
https://lists.apache.org/thread/nbksq32cs8h1ldj8762y6wh9zzp8gqx6


There was some consensus there that the Parquet java would be the best place to 
start.  Before we start looking at code changes, I wanted to briefly outline 
the motivation and the changes required to the project. I have outlined a more 
detailed plan in this doc: 
https://docs.google.com/document/d/1Xdlh23tmCs-KvzHhY2RuwFYmc3xntUKcmwb8yxEl78Y/edit?usp=sharing


Would be great to get your thoughts on this. I’ll also join the Parquet sync 
tomorrow and hope to discuss this there.

Thanks,
Ahmar

[DISCUSS] Implementing Parquet read optimisations

Reply via email to