Thanks all!

Yea, I am mostly looking at available tooling to tune parquet files.

Ed, I would be interested to discuss this. Would you (or anyone else) like
to have a dedicated discussion on this? To provide some context, at
Pinterest we are actively looking into adopting/ building such tooling. We,
like others, have been traditionally relying on manual tuning so far, which
isn't really scalable.

Best Regards,
Ashish


On Wed, May 28, 2025 at 9:29 AM Ed Seidl <etse...@apache.org> wrote:

> I'm developing such a tool for my own use. Right now it only optimizes for
> size, but I'm planning to add query time later. I'm trying to get it open
> sourced, but the wheels of bureaucracy turn slowly :(
>
> Ed
>
> On 2025/05/28 15:36:37 Martin Loncaric wrote:
> > I think Ashish's question was about determining the right configuration
> in
> > the first place - IIUC parquet-rewrite requires the user to pass these
> in.
> >
> > I'm not aware of any tool to choose good Parquet configurations
> > automatically. I sometimes use the parquet-tools pip package / CLI to
> > inspect Parquet and see how files are configured, but I've only tuned
> > manually.
> >
> > On Tue, May 27, 2025, 16:22 Andrew Lamb <andrewlam...@gmail.com> wrote:
> >
> > > We have one in the arrow-rs repository: parquet-rewrite[1]
> > >
> > >
> > >
> > > [1]:
> > >
> > >
> https://github.com/apache/arrow-rs/blob/0da003becbd6489f483b70e5914a242edd8c6d1a/parquet/src/bin/parquet-rewrite.rs#L18
> > >
> > > On Tue, May 27, 2025 at 12:41 PM Ashish Singh <asi...@apache.org>
> wrote:
> > >
> > > > Hey all,
> > > >
> > > > Is there any tool/ lib folks use to tune parquet configs to optimize
> for
> > > > storage size / read/ write speed?
> > > >
> > > > - Ashish
> > > >
> > >
> >
>

Reply via email to