For additional spec reading pleasure, the format of the parquet.thrift file is Thrift Interface Definition Language[1].
Parquet metadata is stored using the binary Thrift Compact Protocol[2]. [1]: https://github.com/apache/thrift/blob/master/doc/specs/idl.md [2]: https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md On Tue, Oct 14, 2025 at 5:04 PM Sylvain Lesage <[email protected]> wrote: > Maybe the comments in the specification ( > https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift) > are sufficiently clear? > > > On Tuesday, 14 October 2025 at 10:50 PM, Andrew Bell < > [email protected]> wrote: > > > > > > > Hi, > > > > Is there a document that explains the metadata ( > > https://parquet.apache.org/docs/file-format/metadata) in English? I can > > read code, but I'd rather not :) There seems to be some hand-wavy > language > > that defines certain bits, but I haven't found anything that defines each > > field in the metadata or anything that really defines the format itself > > other than the metadata picture and this: > > https://parquet.apache.org/docs/file-format > > > > Thanks, > > > > -- > > Andrew Bell > > [email protected] >
