+1 on looking in openzl more deeply *before* we add new encodings.

What's very attractive about openzl is that the decoder is fixed and
advancements in encoding are backwards/forwards compatible. This means less
changes to the format itself. The ideal end state would be to add openzl to
parquet and encode everything as PLAIN.

One thing to investigate is if we can get openzl compressed data at some
point in the graph and then perform compressed execution on them. This
would be perfect for dictionary encoded streams.

On Tue, Oct 7, 2025 at 4:34 PM Krisztián Szűcs <[email protected]>
wrote:

> Hi,
>
> There seems to be a new (if I’m not mistaken it was published yesterday)
> codec/compression framework called OpenZL [1][2][3]. I haven’t looked at
> it
> thoroughly yet, but it somewhat reminds me of BtrBlocks.
> Even if we don’t consider more advanced features of a framework like this,
> we could offload the various codec implementations to another project.
>
> Krisztian
>
> [1]: https://openzl.org/
> [2]: https://github.com/facebook/openzl/tree/dev/src/openzl/codecs
> [3]:
> https://engineering.fb.com/2025/10/06/developer-tools/openzl-open-source-format-aware-compression-framework/
>
> > On 2025. Oct 1., at 20:11, Andrew Lamb <[email protected]> wrote:
> >
> > I would like to start a discussion to help organize and rally anyone
> > interested in adding new encodings to Parquet.
> >
> > I am pretty sure there are many people interested in adding new
> encodings,
> > but there are only a few mentions on the mailing list, such as pcode [1]
> > and FSST/ALP/FastLanes [2]. Prateek mentioned on the sync call today
> > that he is working on evaluating some potential encodings and hopes to
> have
> > some information to share soon, and Julien mentioned he had spoken to
> > someone else who might be doing something similar.
> >
> > Now that Julien has defined a process to extend the spec[3] I think the
> > steps are much clearer.
> >
> > So, I would like to invite anyone interested in adding new encodings to
> > respond and let us know if you are willing to help evaluate new encodings
> > and prototype integrations into Parquet implementations?
> >
> > Andrew
> >
> >
> > [1]: https://lists.apache.org/thread/bdmfcj4g6y1ccd3mfgrp7d43d73s6zf6
> > [2]: https://lists.apache.org/thread/s3o9jk0hr942pv6ono4ymnvvj6pfdsdw
> > [3]:
> > https://github.com/apache/parquet-format/blob/master/proposals/README.md
>
>

Reply via email to