This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/parquet-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new b1f0013 deploy: a407d81a41a90b58ae90a6567a84dd084b5d2947
b1f0013 is described below
commit b1f00131a586f9e5f4bfc2a82974b40c1f78dcbd
Author: wgtmac <[email protected]>
AuthorDate: Mon Jul 8 02:26:31 2024 +0000
deploy: a407d81a41a90b58ae90a6567a84dd084b5d2947
---
output/docs/_print/index.html | 32 +++++++++++++++++------------
output/docs/file-format/_print/index.html | 32 +++++++++++++++++------------
output/docs/file-format/index.html | 24 +++++++++++-----------
output/docs/file-format/index.xml | 11 ++++++++--
output/docs/file-format/metadata/index.html | 16 ++++++++++++---
output/docs/index.xml | 11 ++++++++--
output/sitemap.xml | 2 +-
7 files changed, 82 insertions(+), 46 deletions(-)
diff --git a/output/docs/_print/index.html b/output/docs/_print/index.html
index e50cc89..aaa6d1a 100644
--- a/output/docs/_print/index.html
+++ b/output/docs/_print/index.html
@@ -12,26 +12,26 @@ an indivisible unit (in terms of compression and encoding).
There can
be multiple page types which are interleaved in a column
chunk.</p></li></ul><p>Hierarchically, a file consists of one or more row
groups. A row group
contains exactly one column chunk per column. Column chunks contain one or
more pages.</p><h2 id=unit-of-parallelization>Unit of
parallelization</h2><ul><li>MapReduce - File/Row Group</li><li>IO - Column
chunk</li><li>Encoding/Compression - Page</li></ul></div><div class=td-content
style=page-break-before:always><h1 id=pg-52ee54aeff1ffc82031ec74e9a626eba>3 -
File Format</h1><div class=lead>Documentation about the Parquet File
Format.</div><p>This file and the thrift definition should be read together to
understand the format.</p><pre tabindex=0><code> 4-byte [...]
- <Column 1 Chunk 1 + Column Metadata>
- <Column 2 Chunk 1 + Column Metadata>
+ <Column 1 Chunk 1>
+ <Column 2 Chunk 1>
...
- <Column N Chunk 1 + Column Metadata>
- <Column 1 Chunk 2 + Column Metadata>
- <Column 2 Chunk 2 + Column Metadata>
+ <Column N Chunk 1>
+ <Column 1 Chunk 2>
+ <Column 2 Chunk 2>
...
- <Column N Chunk 2 + Column Metadata>
+ <Column N Chunk 2>
...
- <Column 1 Chunk M + Column Metadata>
- <Column 2 Chunk M + Column Metadata>
+ <Column 1 Chunk M>
+ <Column 2 Chunk M>
...
- <Column N Chunk M + Column Metadata>
+ <Column N Chunk M>
File Metadata
4-byte length in bytes of file metadata (little endian)
4-byte magic number "PAR1"
</code></pre><p>In the above example, there are N columns in this table, split
into M row
-groups. The file metadata contains the locations of all the column metadata
+groups. The file metadata contains the locations of all the column chunk
start locations. More details on what is contained in the metadata can be found
-in the Thrift definition.</p><p>Metadata is written after the data to allow
for single pass writing.</p><p>Readers are expected to first read the file
metadata to find all the column
+in the Thrift definition.</p><p>File metadata is written after the data to
allow for single pass writing.</p><p>Readers are expected to first read the
file metadata to find all the column
chunks they are interested in. The columns chunks should then be read
sequentially.</p><p>The format is explicitly designed to separate the metadata
from the data. This
allows splitting columns into multiple files, as well as having a single
metadata
file reference multiple parquet files.</p><p><img alt="File Layout"
src=/images/FileLayout.gif></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-de128b03a7277c17c6dec2cef97af32d>3.1 -
Configurations</h1><h3 id=row-group-size>Row Group Size</h3><p>Larger row
groups allow for larger column chunks which makes it
@@ -44,8 +44,14 @@ per HDFS file.</p><h3 id=data-page--size>Data Page
Size</h3><p>Data pages should
allow for more fine grained reading (e.g. single row lookup). Larger page sizes
incur less space overhead (less page headers) and potentially less parsing
overhead
(processing headers). Note: for sequential scans, it is not expected to read a
page
-at a time; this is not the IO chunk. We recommend 8KB for page
sizes.</p></div><div class=td-content style=page-break-before:always><h1
id=pg-6762d78210357c1df172dfddcc6fd307>3.2 - Extensibility</h1><p>There are
many places in the format for compatible extensions:</p><ul><li>File Version:
The file metadata contains a version.</li><li>Encodings: Encodings are
specified by enum and more can be added in the future.</li><li>Page types:
Additional page types can be added and safely skipped.</ [...]
-header metadata. All thrift structures are serialized using the
TCompactProtocol.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>3.4 -
Types</h1><p>The types supported by the file format are intended to be as
minimal as possible,
+at a time; this is not the IO chunk. We recommend 8KB for page
sizes.</p></div><div class=td-content style=page-break-before:always><h1
id=pg-6762d78210357c1df172dfddcc6fd307>3.2 - Extensibility</h1><p>There are
many places in the format for compatible extensions:</p><ul><li>File Version:
The file metadata contains a version.</li><li>Encodings: Encodings are
specified by enum and more can be added in the future.</li><li>Page types:
Additional page types can be added and safely skipped.</ [...]
+In the diagram below, file metadata is described by the
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.</p><p>All thrift structures are
serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>3.4 -
Types</h1><p>The types supported by the file format are intended to be as
minimal as possible,
with a focus on how the types effect on disk storage. For example, 16-bit ints
are not explicitly supported in the storage format since they are covered by
32-bit ints with an efficient encoding. This reduces the complexity of
implementing
diff --git a/output/docs/file-format/_print/index.html
b/output/docs/file-format/_print/index.html
index 21f1c67..fc8b61f 100644
--- a/output/docs/file-format/_print/index.html
+++ b/output/docs/file-format/_print/index.html
@@ -5,26 +5,26 @@
"><meta name=twitter:card content="summary"><meta name=twitter:title
content="File Format"><meta name=twitter:description content="Documentation
about the Parquet File Format.
"><link rel=preload
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
as=style><link
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
rel=stylesheet integrity><script
src=https://code.jquery.com/jquery-3.6.3.min.js
integrity="sha512-STof4xm1wgkfm7heWqFJVn58Hm3EtS31XFaagaa8VMReCXAkQnJZ+jEy8PCC/iT18dFy95WcExNHFTqLyp72eQ=="
crossorigin=anonymous></script><link rel=stylesheet
href=https://cdn.jsdelivr.net/npm/@doc [...]
<a href=# onclick="return print(),!1">Click here to print</a>.</p><p><a
href=/docs/file-format/>Return to the regular view of this
page</a>.</p></div><h1 class=title>File Format</h1><div
class=lead>Documentation about the Parquet File Format.</div><ul><li>1: <a
href=#pg-de128b03a7277c17c6dec2cef97af32d>Configurations</a></li><li>2: <a
href=#pg-6762d78210357c1df172dfddcc6fd307>Extensibility</a></li><li>3: <a
href=#pg-e1f24cbf7bc8c4ddd1cc8213993795a7>Metadata</a></li><li>4: <a
href=#pg-140 [...]
- <Column 1 Chunk 1 + Column Metadata>
- <Column 2 Chunk 1 + Column Metadata>
+ <Column 1 Chunk 1>
+ <Column 2 Chunk 1>
...
- <Column N Chunk 1 + Column Metadata>
- <Column 1 Chunk 2 + Column Metadata>
- <Column 2 Chunk 2 + Column Metadata>
+ <Column N Chunk 1>
+ <Column 1 Chunk 2>
+ <Column 2 Chunk 2>
...
- <Column N Chunk 2 + Column Metadata>
+ <Column N Chunk 2>
...
- <Column 1 Chunk M + Column Metadata>
- <Column 2 Chunk M + Column Metadata>
+ <Column 1 Chunk M>
+ <Column 2 Chunk M>
...
- <Column N Chunk M + Column Metadata>
+ <Column N Chunk M>
File Metadata
4-byte length in bytes of file metadata (little endian)
4-byte magic number "PAR1"
</code></pre><p>In the above example, there are N columns in this table, split
into M row
-groups. The file metadata contains the locations of all the column metadata
+groups. The file metadata contains the locations of all the column chunk
start locations. More details on what is contained in the metadata can be found
-in the Thrift definition.</p><p>Metadata is written after the data to allow
for single pass writing.</p><p>Readers are expected to first read the file
metadata to find all the column
+in the Thrift definition.</p><p>File metadata is written after the data to
allow for single pass writing.</p><p>Readers are expected to first read the
file metadata to find all the column
chunks they are interested in. The columns chunks should then be read
sequentially.</p><p>The format is explicitly designed to separate the metadata
from the data. This
allows splitting columns into multiple files, as well as having a single
metadata
file reference multiple parquet files.</p><p><img alt="File Layout"
src=/images/FileLayout.gif></p></div></div><div class=td-content
style=page-break-before:always><h1 id=pg-de128b03a7277c17c6dec2cef97af32d>1 -
Configurations</h1><h3 id=row-group-size>Row Group Size</h3><p>Larger row
groups allow for larger column chunks which makes it
@@ -37,8 +37,14 @@ per HDFS file.</p><h3 id=data-page--size>Data Page
Size</h3><p>Data pages should
allow for more fine grained reading (e.g. single row lookup). Larger page sizes
incur less space overhead (less page headers) and potentially less parsing
overhead
(processing headers). Note: for sequential scans, it is not expected to read a
page
-at a time; this is not the IO chunk. We recommend 8KB for page
sizes.</p></div><div class=td-content style=page-break-before:always><h1
id=pg-6762d78210357c1df172dfddcc6fd307>2 - Extensibility</h1><p>There are many
places in the format for compatible extensions:</p><ul><li>File Version: The
file metadata contains a version.</li><li>Encodings: Encodings are specified by
enum and more can be added in the future.</li><li>Page types: Additional page
types can be added and safely skipped.</li [...]
-header metadata. All thrift structures are serialized using the
TCompactProtocol.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>4 -
Types</h1><p>The types supported by the file format are intended to be as
minimal as possible,
+at a time; this is not the IO chunk. We recommend 8KB for page
sizes.</p></div><div class=td-content style=page-break-before:always><h1
id=pg-6762d78210357c1df172dfddcc6fd307>2 - Extensibility</h1><p>There are many
places in the format for compatible extensions:</p><ul><li>File Version: The
file metadata contains a version.</li><li>Encodings: Encodings are specified by
enum and more can be added in the future.</li><li>Page types: Additional page
types can be added and safely skipped.</li [...]
+In the diagram below, file metadata is described by the
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.</p><p>All thrift structures are
serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>4 -
Types</h1><p>The types supported by the file format are intended to be as
minimal as possible,
with a focus on how the types effect on disk storage. For example, 16-bit ints
are not explicitly supported in the storage format since they are covered by
32-bit ints with an efficient encoding. This reduces the complexity of
implementing
diff --git a/output/docs/file-format/index.html
b/output/docs/file-format/index.html
index 8cc7152..719a65f 100644
--- a/output/docs/file-format/index.html
+++ b/output/docs/file-format/index.html
@@ -8,28 +8,28 @@
<a
href="https://github.com/apache/parquet-site/new/production/content/en/docs/File%20Format?filename=change-me.md&value=---%0Atitle%3A+%22Long+Page+Title%22%0AlinkTitle%3A+%22Short+Nav+Title%22%0Aweight%3A+100%0Adescription%3A+%3E-%0A+++++Page+description+for+heading+and+indexes.%0A---%0A%0A%23%23+Heading%0A%0AEdit+this+template+to+create+your+new+page.%0A%0A%2A+Give+it+a+good+name%2C+ending+in+%60.md%60+-+e.g.+%60getting-started.md%60%0A%2A+Edit+the+%22front+matter%22+section+at+th
[...]
<a
href="https://github.com/apache/parquet-site/issues/new?title=File%20Format"
class="td-page-meta--issue td-page-meta__issue" target=_blank rel=noopener><i
class="fa-solid fa-list-check fa-fw"></i> Create documentation issue</a>
<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print
fa-fw"></i> Print entire section</a></div></aside><main class="col-12 col-md-9
col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb class=td-breadcrumbs><ol
class=breadcrumb><li class=breadcrumb-item><a
href=/docs/>Documentation</a></li><li class="breadcrumb-item active"
aria-current=page>File Format</li></ol></nav><div class=td-content><h1>File
Format</h1><div class=lead>Documentation about the Parquet File Format. [...]
- <Column 1 Chunk 1 + Column Metadata>
- <Column 2 Chunk 1 + Column Metadata>
+ <Column 1 Chunk 1>
+ <Column 2 Chunk 1>
...
- <Column N Chunk 1 + Column Metadata>
- <Column 1 Chunk 2 + Column Metadata>
- <Column 2 Chunk 2 + Column Metadata>
+ <Column N Chunk 1>
+ <Column 1 Chunk 2>
+ <Column 2 Chunk 2>
...
- <Column N Chunk 2 + Column Metadata>
+ <Column N Chunk 2>
...
- <Column 1 Chunk M + Column Metadata>
- <Column 2 Chunk M + Column Metadata>
+ <Column 1 Chunk M>
+ <Column 2 Chunk M>
...
- <Column N Chunk M + Column Metadata>
+ <Column N Chunk M>
File Metadata
4-byte length in bytes of file metadata (little endian)
4-byte magic number "PAR1"
</code></pre><p>In the above example, there are N columns in this table, split
into M row
-groups. The file metadata contains the locations of all the column metadata
+groups. The file metadata contains the locations of all the column chunk
start locations. More details on what is contained in the metadata can be found
-in the Thrift definition.</p><p>Metadata is written after the data to allow
for single pass writing.</p><p>Readers are expected to first read the file
metadata to find all the column
+in the Thrift definition.</p><p>File metadata is written after the data to
allow for single pass writing.</p><p>Readers are expected to first read the
file metadata to find all the column
chunks they are interested in. The columns chunks should then be read
sequentially.</p><p>The format is explicitly designed to separate the metadata
from the data. This
allows splitting columns into multiple files, as well as having a single
metadata
-file reference multiple parquet files.</p><p><img alt="File Layout"
src=/images/FileLayout.gif></p><div class=section-index><hr
class=panel-line><div class=entry><h5><a
href=/docs/file-format/configurations/>Configurations</a></h5><p></p></div><div
class=entry><h5><a
href=/docs/file-format/extensibility/>Extensibility</a></h5><p></p></div><div
class=entry><h5><a
href=/docs/file-format/metadata/>Metadata</a></h5><p></p></div><div
class=entry><h5><a href=/docs/file-format/types/>Types</a>< [...]
+file reference multiple parquet files.</p><p><img alt="File Layout"
src=/images/FileLayout.gif></p><div class=section-index><hr
class=panel-line><div class=entry><h5><a
href=/docs/file-format/configurations/>Configurations</a></h5><p></p></div><div
class=entry><h5><a
href=/docs/file-format/extensibility/>Extensibility</a></h5><p></p></div><div
class=entry><h5><a
href=/docs/file-format/metadata/>Metadata</a></h5><p></p></div><div
class=entry><h5><a href=/docs/file-format/types/>Types</a>< [...]
2024
<span class=td-footer__authors>Apache Parquet</span></span><span
class=td-footer__all_rights_reserved>All Rights Reserved</span><span
class=ms-2><a href=https://policies.google.com/privacy target=_blank
rel=noopener>Privacy Policy</a></span></div></div></div></footer></div><script
src=/js/main.min.26b35480299b932e285af8358c943de97509b95a0086d091584e7cb9b00c5c7b.js
integrity="sha256-JrNUgCmbky4oWvg1jJQ96XUJuVoAhtCRWE58ubAMXHs="
crossorigin=anonymous></script><script defer src=/js/click-to [...]
\ No newline at end of file
diff --git a/output/docs/file-format/index.xml
b/output/docs/file-format/index.xml
index f0bcb2f..e54dc92 100644
--- a/output/docs/file-format/index.xml
+++ b/output/docs/file-format/index.xml
@@ -19,8 +19,15 @@ at a time; this is not the IO chunk. We recommend 8KB for
page sizes.</p></de
<li>Encodings: Encodings are specified by enum and more can be added in the
future.</li>
<li>Page types: Additional page types can be added and safely
skipped.</li>
</ul></description></item><item><title>Docs:
Metadata</title><link>/docs/file-format/metadata/</link><pubDate>Mon, 01 Jan
0001 00:00:00
+0000</pubDate><guid>/docs/file-format/metadata/</guid><description>
-<p>There are three types of metadata: file metadata, column (chunk)
metadata and page
-header metadata. All thrift structures are serialized using the
TCompactProtocol.</p>
+<p>There are two types of metadata: file metadata, and page header metadata.
+In the diagram below, file metadata is described by the
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.</p>
+<p>All thrift structures are serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+<a
href="https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift">Thrift
definition</a>.</p>
<p><img alt="File Layout"
src="/images/FileFormat.gif"></p></description></item><item><title>Docs:
Types</title><link>/docs/file-format/types/</link><pubDate>Mon, 01 Jan 0001
00:00:00 +0000</pubDate><guid>/docs/file-format/types/</guid><description>
<p>The types supported by the file format are intended to be as minimal as
possible,
with a focus on how the types effect on disk storage. For example, 16-bit ints
diff --git a/output/docs/file-format/metadata/index.html
b/output/docs/file-format/metadata/index.html
index 22cc8f5..5585109 100644
--- a/output/docs/file-format/metadata/index.html
+++ b/output/docs/file-format/metadata/index.html
@@ -1,9 +1,19 @@
<!doctype html><html itemscope itemtype=http://schema.org/WebPage lang=en
class=no-js><head><meta charset=utf-8><meta name=viewport
content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots
content="index, follow"><link rel="shortcut icon"
href=/favicons/favicon.ico><link rel=apple-touch-icon
href=/favicons/apple-touch-icon-180x180.png sizes=180x180><link rel=icon
type=image/png href=/favicons/favicon-16x16.png sizes=16x16><link rel=icon
type=image/png href=/favicon [...]
-<meta name=description content="There are three types of metadata: file
metadata, column (chunk) metadata and page header metadata. All thrift
structures are serialized using the TCompactProtocol."><meta
property="og:title" content="Metadata"><meta property="og:description"
content="There are three types of metadata: file metadata, column (chunk)
metadata and page header metadata. All thrift structures are serialized using
the TCompactProtocol."><meta property="og:type" content="article" [...]
+<meta name=description content="There are two types of metadata: file
metadata, and page header metadata. In the diagram below, file metadata is
described by the FileMetaData structure. This file metadata provides offset and
size information useful when navigating the Parquet file. Page header metadata
(PageHeader and children in the diagram) is stored in-line with the page data,
and is used in the reading and decoding of said data.
+All thrift structures are serialized using the TCompactProtocol."><meta
property="og:title" content="Metadata"><meta property="og:description"
content="There are two types of metadata: file metadata, and page header
metadata. In the diagram below, file metadata is described by the FileMetaData
structure. This file metadata provides offset and size information useful when
navigating the Parquet file. Page header metadata (PageHeader and children in
the diagram) is stored in-line with the [...]
+All thrift structures are serialized using the TCompactProtocol."><meta
property="og:type" content="article"><meta property="og:url"
content="/docs/file-format/metadata/"><meta property="article:section"
content="docs"><meta property="article:modified_time"
content="2024-07-07T19:25:32-07:00"><meta property="og:site_name"
content="Apache Parquet"><meta itemprop=name content="Metadata"><meta
itemprop=description content="There are two types of metadata: file metadata,
and page header meta [...]
+All thrift structures are serialized using the TCompactProtocol."><meta
itemprop=dateModified content="2024-07-07T19:25:32-07:00"><meta
itemprop=wordCount content="86"><meta itemprop=keywords content><meta
name=twitter:card content="summary"><meta name=twitter:title
content="Metadata"><meta name=twitter:description content="There are two types
of metadata: file metadata, and page header metadata. In the diagram below,
file metadata is described by the FileMetaData structure. This file me [...]
+All thrift structures are serialized using the TCompactProtocol."><link
rel=preload
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
as=style><link
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
rel=stylesheet integrity><script
src=https://code.jquery.com/jquery-3.6.3.min.js
integrity="sha512-STof4xm1wgkfm7heWqFJVn58Hm3EtS31XFaagaa8VMReCXAkQnJZ+jEy8PCC/iT18dFy95WcExNHFTqLyp72eQ=="
crossorigin=anonymous></sc [...]
<a
href=https://github.com/apache/parquet-site/edit/production/content/en/docs/File%20Format/metadata.md
class="td-page-meta--edit td-page-meta__edit" target=_blank rel=noopener><i
class="fa-solid fa-pen-to-square fa-fw"></i> Edit this page</a>
<a
href="https://github.com/apache/parquet-site/new/production/content/en/docs/File%20Format?filename=change-me.md&value=---%0Atitle%3A+%22Long+Page+Title%22%0AlinkTitle%3A+%22Short+Nav+Title%22%0Aweight%3A+100%0Adescription%3A+%3E-%0A+++++Page+description+for+heading+and+indexes.%0A---%0A%0A%23%23+Heading%0A%0AEdit+this+template+to+create+your+new+page.%0A%0A%2A+Give+it+a+good+name%2C+ending+in+%60.md%60+-+e.g.+%60getting-started.md%60%0A%2A+Edit+the+%22front+matter%22+section+at+th
[...]
<a href="https://github.com/apache/parquet-site/issues/new?title=Metadata"
class="td-page-meta--issue td-page-meta__issue" target=_blank rel=noopener><i
class="fa-solid fa-list-check fa-fw"></i> Create documentation issue</a>
-<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print
fa-fw"></i> Print entire section</a></div></aside><main class="col-12 col-md-9
col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb class=td-breadcrumbs><ol
class=breadcrumb><li class=breadcrumb-item><a
href=/docs/>Documentation</a></li><li class=breadcrumb-item><a
href=/docs/file-format/>File Format</a></li><li class="breadcrumb-item active"
aria-current=page>Metadata</li></ol></nav><div class=td-content><h1>Metada [...]
-header metadata. All thrift structures are serialized using the
TCompactProtocol.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p><div class=td-page-meta__lastmod>Last modified
March 8, 2024: <a
href=https://github.com/apache/parquet-site/commit/b3b81ce3e9f9e6f25b41f463577976628515384a>Update
to new website (b3b81ce)</a></div></div></main></div></div><footer
class="td-footer row d-print-none"><div class=container-fluid><div class="row
mx-md-2"><div class="td-footer__left col- [...]
+<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print
fa-fw"></i> Print entire section</a></div></aside><main class="col-12 col-md-9
col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb class=td-breadcrumbs><ol
class=breadcrumb><li class=breadcrumb-item><a
href=/docs/>Documentation</a></li><li class=breadcrumb-item><a
href=/docs/file-format/>File Format</a></li><li class="breadcrumb-item active"
aria-current=page>Metadata</li></ol></nav><div class=td-content><h1>Metada [...]
+In the diagram below, file metadata is described by the
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.</p><p>All thrift structures are
serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p><div class=td-page-meta__lastmod>Last modified
July 7, 2024: <a
href=https://github.com/apache/parquet-site/commit/a407d81a41a90b58ae90a6567a84dd084b5d2947>GH-68:
Match language from parquet-format after merge of PARQUET-2139 (#69)
(a407d81)</a></div></div></main></div></div><footer class="td-footer row
d-print-none [...]
2024
<span class=td-footer__authors>Apache Parquet</span></span><span
class=td-footer__all_rights_reserved>All Rights Reserved</span><span
class=ms-2><a href=https://policies.google.com/privacy target=_blank
rel=noopener>Privacy Policy</a></span></div></div></div></footer></div><script
src=/js/main.min.26b35480299b932e285af8358c943de97509b95a0086d091584e7cb9b00c5c7b.js
integrity="sha256-JrNUgCmbky4oWvg1jJQ96XUJuVoAhtCRWE58ubAMXHs="
crossorigin=anonymous></script><script defer src=/js/click-to [...]
\ No newline at end of file
diff --git a/output/docs/index.xml b/output/docs/index.xml
index 8971aa9..b03b3ff 100644
--- a/output/docs/index.xml
+++ b/output/docs/index.xml
@@ -1089,8 +1089,15 @@ example, strings are stored as byte arrays (binary) with
a UTF8 annotation.
These annotations define how to further decode and interpret the data.
Annotations are stored as <code>LogicalType</code> fields in the file
metadata and are
documented in LogicalTypes.md.</p></description></item><item><title>Docs:
Metadata</title><link>/docs/file-format/metadata/</link><pubDate>Mon, 01 Jan
0001 00:00:00
+0000</pubDate><guid>/docs/file-format/metadata/</guid><description>
-<p>There are three types of metadata: file metadata, column (chunk)
metadata and page
-header metadata. All thrift structures are serialized using the
TCompactProtocol.</p>
+<p>There are two types of metadata: file metadata, and page header metadata.
+In the diagram below, file metadata is described by the
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.</p>
+<p>All thrift structures are serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+<a
href="https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift">Thrift
definition</a>.</p>
<p><img alt="File Layout"
src="/images/FileFormat.gif"></p></description></item><item><title>Docs:
Nested
Encoding</title><link>/docs/file-format/nestedencoding/</link><pubDate>Mon, 01
Jan 0001 00:00:00
+0000</pubDate><guid>/docs/file-format/nestedencoding/</guid><description>
<p>To encode nested columns, Parquet uses the Dremel encoding with
definition and
repetition levels. Definition levels specify how many optional fields in the
diff --git a/output/sitemap.xml b/output/sitemap.xml
index 4623773..616e81f 100644
--- a/output/sitemap.xml
+++ b/output/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/docs/file-format/data-pages/compression/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encodings/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encryption/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/docs/file-format/data-pages/compression/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encodings/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encryption/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/
[...]
\ No newline at end of file