I'm developing such a tool for my own use. Right now it only optimizes for 
size, but I'm planning to add query time later. I'm trying to get it open 
sourced, but the wheels of bureaucracy turn slowly :(

Ed

On 2025/05/28 15:36:37 Martin Loncaric wrote:
> I think Ashish's question was about determining the right configuration in
> the first place - IIUC parquet-rewrite requires the user to pass these in.
> 
> I'm not aware of any tool to choose good Parquet configurations
> automatically. I sometimes use the parquet-tools pip package / CLI to
> inspect Parquet and see how files are configured, but I've only tuned
> manually.
> 
> On Tue, May 27, 2025, 16:22 Andrew Lamb <andrewlam...@gmail.com> wrote:
> 
> > We have one in the arrow-rs repository: parquet-rewrite[1]
> >
> >
> >
> > [1]:
> >
> > https://github.com/apache/arrow-rs/blob/0da003becbd6489f483b70e5914a242edd8c6d1a/parquet/src/bin/parquet-rewrite.rs#L18
> >
> > On Tue, May 27, 2025 at 12:41 PM Ashish Singh <asi...@apache.org> wrote:
> >
> > > Hey all,
> > >
> > > Is there any tool/ lib folks use to tune parquet configs to optimize for
> > > storage size / read/ write speed?
> > >
> > > - Ashish
> > >
> >
> 

Reply via email to