Thanks all! Yea, I am mostly looking at available tooling to tune parquet files.
Ed, I would be interested to discuss this. Would you (or anyone else) like to have a dedicated discussion on this? To provide some context, at Pinterest we are actively looking into adopting/ building such tooling. We, like others, have been traditionally relying on manual tuning so far, which isn't really scalable. Best Regards, Ashish On Wed, May 28, 2025 at 9:29 AM Ed Seidl <etse...@apache.org> wrote: > I'm developing such a tool for my own use. Right now it only optimizes for > size, but I'm planning to add query time later. I'm trying to get it open > sourced, but the wheels of bureaucracy turn slowly :( > > Ed > > On 2025/05/28 15:36:37 Martin Loncaric wrote: > > I think Ashish's question was about determining the right configuration > in > > the first place - IIUC parquet-rewrite requires the user to pass these > in. > > > > I'm not aware of any tool to choose good Parquet configurations > > automatically. I sometimes use the parquet-tools pip package / CLI to > > inspect Parquet and see how files are configured, but I've only tuned > > manually. > > > > On Tue, May 27, 2025, 16:22 Andrew Lamb <andrewlam...@gmail.com> wrote: > > > > > We have one in the arrow-rs repository: parquet-rewrite[1] > > > > > > > > > > > > [1]: > > > > > > > https://github.com/apache/arrow-rs/blob/0da003becbd6489f483b70e5914a242edd8c6d1a/parquet/src/bin/parquet-rewrite.rs#L18 > > > > > > On Tue, May 27, 2025 at 12:41 PM Ashish Singh <asi...@apache.org> > wrote: > > > > > > > Hey all, > > > > > > > > Is there any tool/ lib folks use to tune parquet configs to optimize > for > > > > storage size / read/ write speed? > > > > > > > > - Ashish > > > > > > > > > >