Re: multi-frame ZSTD compression

Andrew Pilloud via dev Thu, 05 Mar 2026 16:52:17 -0800

Hi, another long time lurker here.

+1 to this. I've been toying with ways to do partial page decoding. Things
like reading only def levels in v1 pages, doing selective reads in plain
pages, or partially decompressing large pages due to memory pressure.
Writing files with multi-frame Ztd would make that more efficient, but
there are definitely concerns around reader compatibility.


Andrew

On Thu, Mar 5, 2026 at 1:34 AM Will Edwards via dev <[email protected]>
wrote:

> howdy folks, nice to e-meet you all :D. I am a long time lurker.  Love
> Parquet :D
>
> Currently, the compression specs for parquet address multi-frame GZIP:
>
> "Readers should support reading pages containing multiple GZIP members,
> however, as this has historically not been supported by all
> implementations, it is recommended that writers refrain from creating such
> pages by default for better interoperability."
>
> https://github.com/apache/parquet-format/blob/master/Compression.md
>
> However, there is no corresponding mention of a ZSTD page containing
> concatenated ZSTD frames.
>
> I am not aware of any Parquet readers that do not support this.  The go-to
> decompress function in the mainstream ZSTD library, which everyone surely
> uses, transparently supports multi-frame data.
>
> However, the reference library includes a function for decoding only a
> single frame, and readers I am not aware of might use it.
>
> Can we add a note to the compression spec to explicitly bless multi-frame
> ZSTD too, to avoid any future confusion?
>
> Yours hopefully,
>
> Will
>

Re: multi-frame ZSTD compression

Reply via email to