howdy folks, nice to e-meet you all :D. I am a long time lurker.  Love
Parquet :D

Currently, the compression specs for parquet address multi-frame GZIP:

"Readers should support reading pages containing multiple GZIP members,
however, as this has historically not been supported by all
implementations, it is recommended that writers refrain from creating such
pages by default for better interoperability."

https://github.com/apache/parquet-format/blob/master/Compression.md

However, there is no corresponding mention of a ZSTD page containing
concatenated ZSTD frames.

I am not aware of any Parquet readers that do not support this.  The go-to
decompress function in the mainstream ZSTD library, which everyone surely
uses, transparently supports multi-frame data.

However, the reference library includes a function for decoding only a
single frame, and readers I am not aware of might use it.

Can we add a note to the compression spec to explicitly bless multi-frame
ZSTD too, to avoid any future confusion?

Yours hopefully,

Will

Reply via email to