Re: multi-frame ZSTD compression

Antoine Pitrou Fri, 06 Mar 2026 00:18:36 -0800

There may be multiple ZSTD implementations out there (including perhapsclosed source), especially as it's now a IETF standard:

https://datatracker.ietf.org/doc/html/rfc8878

A bit of research would be necessary to find out whether other ZSTDimplementations similarly decompress multi-frame bodies transparently.


Regards

Antoine.


Le 06/03/2026 à 01:49, Andrew Pilloud via dev a écrit :

Hi, another long time lurker here.

+1 to this. I've been toying with ways to do partial page decoding. Things
like reading only def levels in v1 pages, doing selective reads in plain
pages, or partially decompressing large pages due to memory pressure.
Writing files with multi-frame Ztd would make that more efficient, but
there are definitely concerns around reader compatibility.

Andrew

On Thu, Mar 5, 2026 at 1:34 AM Will Edwards via dev <[email protected]>
wrote:

howdy folks, nice to e-meet you all :D. I am a long time lurker.  Love
Parquet :D

Currently, the compression specs for parquet address multi-frame GZIP:

"Readers should support reading pages containing multiple GZIP members,
however, as this has historically not been supported by all
implementations, it is recommended that writers refrain from creating such
pages by default for better interoperability."

https://github.com/apache/parquet-format/blob/master/Compression.md

However, there is no corresponding mention of a ZSTD page containing
concatenated ZSTD frames.

I am not aware of any Parquet readers that do not support this.  The go-to
decompress function in the mainstream ZSTD library, which everyone surely
uses, transparently supports multi-frame data.

However, the reference library includes a function for decoding only a
single frame, and readers I am not aware of might use it.

Can we add a note to the compression spec to explicitly bless multi-frame
ZSTD too, to avoid any future confusion?

Yours hopefully,

Will

Re: multi-frame ZSTD compression

Reply via email to