This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/parquet-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new b1f0013  deploy: a407d81a41a90b58ae90a6567a84dd084b5d2947
b1f0013 is described below

commit b1f00131a586f9e5f4bfc2a82974b40c1f78dcbd
Author: wgtmac <[email protected]>
AuthorDate: Mon Jul 8 02:26:31 2024 +0000

    deploy: a407d81a41a90b58ae90a6567a84dd084b5d2947
---
 output/docs/_print/index.html               | 32 +++++++++++++++++------------
 output/docs/file-format/_print/index.html   | 32 +++++++++++++++++------------
 output/docs/file-format/index.html          | 24 +++++++++++-----------
 output/docs/file-format/index.xml           | 11 ++++++++--
 output/docs/file-format/metadata/index.html | 16 ++++++++++++---
 output/docs/index.xml                       | 11 ++++++++--
 output/sitemap.xml                          |  2 +-
 7 files changed, 82 insertions(+), 46 deletions(-)

diff --git a/output/docs/_print/index.html b/output/docs/_print/index.html
index e50cc89..aaa6d1a 100644
--- a/output/docs/_print/index.html
+++ b/output/docs/_print/index.html
@@ -12,26 +12,26 @@ an indivisible unit (in terms of compression and encoding). 
There can
 be multiple page types which are interleaved in a column 
chunk.</p></li></ul><p>Hierarchically, a file consists of one or more row 
groups. A row group
 contains exactly one column chunk per column. Column chunks contain one or
 more pages.</p><h2 id=unit-of-parallelization>Unit of 
parallelization</h2><ul><li>MapReduce - File/Row Group</li><li>IO - Column 
chunk</li><li>Encoding/Compression - Page</li></ul></div><div class=td-content 
style=page-break-before:always><h1 id=pg-52ee54aeff1ffc82031ec74e9a626eba>3 - 
File Format</h1><div class=lead>Documentation about the Parquet File 
Format.</div><p>This file and the thrift definition should be read together to 
understand the format.</p><pre tabindex=0><code>    4-byte [...]
-    &lt;Column 1 Chunk 1 + Column Metadata&gt;
-    &lt;Column 2 Chunk 1 + Column Metadata&gt;
+    &lt;Column 1 Chunk 1&gt;
+    &lt;Column 2 Chunk 1&gt;
     ...
-    &lt;Column N Chunk 1 + Column Metadata&gt;
-    &lt;Column 1 Chunk 2 + Column Metadata&gt;
-    &lt;Column 2 Chunk 2 + Column Metadata&gt;
+    &lt;Column N Chunk 1&gt;
+    &lt;Column 1 Chunk 2&gt;
+    &lt;Column 2 Chunk 2&gt;
     ...
-    &lt;Column N Chunk 2 + Column Metadata&gt;
+    &lt;Column N Chunk 2&gt;
     ...
-    &lt;Column 1 Chunk M + Column Metadata&gt;
-    &lt;Column 2 Chunk M + Column Metadata&gt;
+    &lt;Column 1 Chunk M&gt;
+    &lt;Column 2 Chunk M&gt;
     ...
-    &lt;Column N Chunk M + Column Metadata&gt;
+    &lt;Column N Chunk M&gt;
     File Metadata
     4-byte length in bytes of file metadata (little endian)
     4-byte magic number &#34;PAR1&#34;
 </code></pre><p>In the above example, there are N columns in this table, split 
into M row
-groups. The file metadata contains the locations of all the column metadata
+groups. The file metadata contains the locations of all the column chunk
 start locations. More details on what is contained in the metadata can be found
-in the Thrift definition.</p><p>Metadata is written after the data to allow 
for single pass writing.</p><p>Readers are expected to first read the file 
metadata to find all the column
+in the Thrift definition.</p><p>File metadata is written after the data to 
allow for single pass writing.</p><p>Readers are expected to first read the 
file metadata to find all the column
 chunks they are interested in. The columns chunks should then be read 
sequentially.</p><p>The format is explicitly designed to separate the metadata 
from the data. This
 allows splitting columns into multiple files, as well as having a single 
metadata
 file reference multiple parquet files.</p><p><img alt="File Layout" 
src=/images/FileLayout.gif></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-de128b03a7277c17c6dec2cef97af32d>3.1 - 
Configurations</h1><h3 id=row-group-size>Row Group Size</h3><p>Larger row 
groups allow for larger column chunks which makes it
@@ -44,8 +44,14 @@ per HDFS file.</p><h3 id=data-page--size>Data Page 
Size</h3><p>Data pages should
 allow for more fine grained reading (e.g. single row lookup). Larger page sizes
 incur less space overhead (less page headers) and potentially less parsing 
overhead
 (processing headers). Note: for sequential scans, it is not expected to read a 
page
-at a time; this is not the IO chunk. We recommend 8KB for page 
sizes.</p></div><div class=td-content style=page-break-before:always><h1 
id=pg-6762d78210357c1df172dfddcc6fd307>3.2 - Extensibility</h1><p>There are 
many places in the format for compatible extensions:</p><ul><li>File Version: 
The file metadata contains a version.</li><li>Encodings: Encodings are 
specified by enum and more can be added in the future.</li><li>Page types: 
Additional page types can be added and safely skipped.</ [...]
-header metadata. All thrift structures are serialized using the 
TCompactProtocol.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>3.4 - 
Types</h1><p>The types supported by the file format are intended to be as 
minimal as possible,
+at a time; this is not the IO chunk. We recommend 8KB for page 
sizes.</p></div><div class=td-content style=page-break-before:always><h1 
id=pg-6762d78210357c1df172dfddcc6fd307>3.2 - Extensibility</h1><p>There are 
many places in the format for compatible extensions:</p><ul><li>File Version: 
The file metadata contains a version.</li><li>Encodings: Encodings are 
specified by enum and more can be added in the future.</li><li>Page types: 
Additional page types can be added and safely skipped.</ [...]
+In the diagram below, file metadata is described by the 
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata 
(<code>PageHeader</code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.</p><p>All thrift structures are 
serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>3.4 - 
Types</h1><p>The types supported by the file format are intended to be as 
minimal as possible,
 with a focus on how the types effect on disk storage. For example, 16-bit ints
 are not explicitly supported in the storage format since they are covered by
 32-bit ints with an efficient encoding. This reduces the complexity of 
implementing
diff --git a/output/docs/file-format/_print/index.html 
b/output/docs/file-format/_print/index.html
index 21f1c67..fc8b61f 100644
--- a/output/docs/file-format/_print/index.html
+++ b/output/docs/file-format/_print/index.html
@@ -5,26 +5,26 @@
 "><meta name=twitter:card content="summary"><meta name=twitter:title 
content="File Format"><meta name=twitter:description content="Documentation 
about the Parquet File Format.
 "><link rel=preload 
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
 as=style><link 
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
 rel=stylesheet integrity><script 
src=https://code.jquery.com/jquery-3.6.3.min.js 
integrity="sha512-STof4xm1wgkfm7heWqFJVn58Hm3EtS31XFaagaa8VMReCXAkQnJZ+jEy8PCC/iT18dFy95WcExNHFTqLyp72eQ=="
 crossorigin=anonymous></script><link rel=stylesheet 
href=https://cdn.jsdelivr.net/npm/@doc [...]
 <a href=# onclick="return print(),!1">Click here to print</a>.</p><p><a 
href=/docs/file-format/>Return to the regular view of this 
page</a>.</p></div><h1 class=title>File Format</h1><div 
class=lead>Documentation about the Parquet File Format.</div><ul><li>1: <a 
href=#pg-de128b03a7277c17c6dec2cef97af32d>Configurations</a></li><li>2: <a 
href=#pg-6762d78210357c1df172dfddcc6fd307>Extensibility</a></li><li>3: <a 
href=#pg-e1f24cbf7bc8c4ddd1cc8213993795a7>Metadata</a></li><li>4: <a 
href=#pg-140 [...]
-    &lt;Column 1 Chunk 1 + Column Metadata&gt;
-    &lt;Column 2 Chunk 1 + Column Metadata&gt;
+    &lt;Column 1 Chunk 1&gt;
+    &lt;Column 2 Chunk 1&gt;
     ...
-    &lt;Column N Chunk 1 + Column Metadata&gt;
-    &lt;Column 1 Chunk 2 + Column Metadata&gt;
-    &lt;Column 2 Chunk 2 + Column Metadata&gt;
+    &lt;Column N Chunk 1&gt;
+    &lt;Column 1 Chunk 2&gt;
+    &lt;Column 2 Chunk 2&gt;
     ...
-    &lt;Column N Chunk 2 + Column Metadata&gt;
+    &lt;Column N Chunk 2&gt;
     ...
-    &lt;Column 1 Chunk M + Column Metadata&gt;
-    &lt;Column 2 Chunk M + Column Metadata&gt;
+    &lt;Column 1 Chunk M&gt;
+    &lt;Column 2 Chunk M&gt;
     ...
-    &lt;Column N Chunk M + Column Metadata&gt;
+    &lt;Column N Chunk M&gt;
     File Metadata
     4-byte length in bytes of file metadata (little endian)
     4-byte magic number &#34;PAR1&#34;
 </code></pre><p>In the above example, there are N columns in this table, split 
into M row
-groups. The file metadata contains the locations of all the column metadata
+groups. The file metadata contains the locations of all the column chunk
 start locations. More details on what is contained in the metadata can be found
-in the Thrift definition.</p><p>Metadata is written after the data to allow 
for single pass writing.</p><p>Readers are expected to first read the file 
metadata to find all the column
+in the Thrift definition.</p><p>File metadata is written after the data to 
allow for single pass writing.</p><p>Readers are expected to first read the 
file metadata to find all the column
 chunks they are interested in. The columns chunks should then be read 
sequentially.</p><p>The format is explicitly designed to separate the metadata 
from the data. This
 allows splitting columns into multiple files, as well as having a single 
metadata
 file reference multiple parquet files.</p><p><img alt="File Layout" 
src=/images/FileLayout.gif></p></div></div><div class=td-content 
style=page-break-before:always><h1 id=pg-de128b03a7277c17c6dec2cef97af32d>1 - 
Configurations</h1><h3 id=row-group-size>Row Group Size</h3><p>Larger row 
groups allow for larger column chunks which makes it
@@ -37,8 +37,14 @@ per HDFS file.</p><h3 id=data-page--size>Data Page 
Size</h3><p>Data pages should
 allow for more fine grained reading (e.g. single row lookup). Larger page sizes
 incur less space overhead (less page headers) and potentially less parsing 
overhead
 (processing headers). Note: for sequential scans, it is not expected to read a 
page
-at a time; this is not the IO chunk. We recommend 8KB for page 
sizes.</p></div><div class=td-content style=page-break-before:always><h1 
id=pg-6762d78210357c1df172dfddcc6fd307>2 - Extensibility</h1><p>There are many 
places in the format for compatible extensions:</p><ul><li>File Version: The 
file metadata contains a version.</li><li>Encodings: Encodings are specified by 
enum and more can be added in the future.</li><li>Page types: Additional page 
types can be added and safely skipped.</li [...]
-header metadata. All thrift structures are serialized using the 
TCompactProtocol.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>4 - 
Types</h1><p>The types supported by the file format are intended to be as 
minimal as possible,
+at a time; this is not the IO chunk. We recommend 8KB for page 
sizes.</p></div><div class=td-content style=page-break-before:always><h1 
id=pg-6762d78210357c1df172dfddcc6fd307>2 - Extensibility</h1><p>There are many 
places in the format for compatible extensions:</p><ul><li>File Version: The 
file metadata contains a version.</li><li>Encodings: Encodings are specified by 
enum and more can be added in the future.</li><li>Page types: Additional page 
types can be added and safely skipped.</li [...]
+In the diagram below, file metadata is described by the 
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata 
(<code>PageHeader</code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.</p><p>All thrift structures are 
serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>4 - 
Types</h1><p>The types supported by the file format are intended to be as 
minimal as possible,
 with a focus on how the types effect on disk storage. For example, 16-bit ints
 are not explicitly supported in the storage format since they are covered by
 32-bit ints with an efficient encoding. This reduces the complexity of 
implementing
diff --git a/output/docs/file-format/index.html 
b/output/docs/file-format/index.html
index 8cc7152..719a65f 100644
--- a/output/docs/file-format/index.html
+++ b/output/docs/file-format/index.html
@@ -8,28 +8,28 @@
 <a 
href="https://github.com/apache/parquet-site/new/production/content/en/docs/File%20Format?filename=change-me.md&amp;value=---%0Atitle%3A+%22Long+Page+Title%22%0AlinkTitle%3A+%22Short+Nav+Title%22%0Aweight%3A+100%0Adescription%3A+%3E-%0A+++++Page+description+for+heading+and+indexes.%0A---%0A%0A%23%23+Heading%0A%0AEdit+this+template+to+create+your+new+page.%0A%0A%2A+Give+it+a+good+name%2C+ending+in+%60.md%60+-+e.g.+%60getting-started.md%60%0A%2A+Edit+the+%22front+matter%22+section+at+th
 [...]
 <a 
href="https://github.com/apache/parquet-site/issues/new?title=File%20Format"; 
class="td-page-meta--issue td-page-meta__issue" target=_blank rel=noopener><i 
class="fa-solid fa-list-check fa-fw"></i> Create documentation issue</a>
 <a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print 
fa-fw"></i> Print entire section</a></div></aside><main class="col-12 col-md-9 
col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb class=td-breadcrumbs><ol 
class=breadcrumb><li class=breadcrumb-item><a 
href=/docs/>Documentation</a></li><li class="breadcrumb-item active" 
aria-current=page>File Format</li></ol></nav><div class=td-content><h1>File 
Format</h1><div class=lead>Documentation about the Parquet File Format. [...]
-    &lt;Column 1 Chunk 1 + Column Metadata&gt;
-    &lt;Column 2 Chunk 1 + Column Metadata&gt;
+    &lt;Column 1 Chunk 1&gt;
+    &lt;Column 2 Chunk 1&gt;
     ...
-    &lt;Column N Chunk 1 + Column Metadata&gt;
-    &lt;Column 1 Chunk 2 + Column Metadata&gt;
-    &lt;Column 2 Chunk 2 + Column Metadata&gt;
+    &lt;Column N Chunk 1&gt;
+    &lt;Column 1 Chunk 2&gt;
+    &lt;Column 2 Chunk 2&gt;
     ...
-    &lt;Column N Chunk 2 + Column Metadata&gt;
+    &lt;Column N Chunk 2&gt;
     ...
-    &lt;Column 1 Chunk M + Column Metadata&gt;
-    &lt;Column 2 Chunk M + Column Metadata&gt;
+    &lt;Column 1 Chunk M&gt;
+    &lt;Column 2 Chunk M&gt;
     ...
-    &lt;Column N Chunk M + Column Metadata&gt;
+    &lt;Column N Chunk M&gt;
     File Metadata
     4-byte length in bytes of file metadata (little endian)
     4-byte magic number &#34;PAR1&#34;
 </code></pre><p>In the above example, there are N columns in this table, split 
into M row
-groups. The file metadata contains the locations of all the column metadata
+groups. The file metadata contains the locations of all the column chunk
 start locations. More details on what is contained in the metadata can be found
-in the Thrift definition.</p><p>Metadata is written after the data to allow 
for single pass writing.</p><p>Readers are expected to first read the file 
metadata to find all the column
+in the Thrift definition.</p><p>File metadata is written after the data to 
allow for single pass writing.</p><p>Readers are expected to first read the 
file metadata to find all the column
 chunks they are interested in. The columns chunks should then be read 
sequentially.</p><p>The format is explicitly designed to separate the metadata 
from the data. This
 allows splitting columns into multiple files, as well as having a single 
metadata
-file reference multiple parquet files.</p><p><img alt="File Layout" 
src=/images/FileLayout.gif></p><div class=section-index><hr 
class=panel-line><div class=entry><h5><a 
href=/docs/file-format/configurations/>Configurations</a></h5><p></p></div><div 
class=entry><h5><a 
href=/docs/file-format/extensibility/>Extensibility</a></h5><p></p></div><div 
class=entry><h5><a 
href=/docs/file-format/metadata/>Metadata</a></h5><p></p></div><div 
class=entry><h5><a href=/docs/file-format/types/>Types</a>< [...]
+file reference multiple parquet files.</p><p><img alt="File Layout" 
src=/images/FileLayout.gif></p><div class=section-index><hr 
class=panel-line><div class=entry><h5><a 
href=/docs/file-format/configurations/>Configurations</a></h5><p></p></div><div 
class=entry><h5><a 
href=/docs/file-format/extensibility/>Extensibility</a></h5><p></p></div><div 
class=entry><h5><a 
href=/docs/file-format/metadata/>Metadata</a></h5><p></p></div><div 
class=entry><h5><a href=/docs/file-format/types/>Types</a>< [...]
 2024
 <span class=td-footer__authors>Apache Parquet</span></span><span 
class=td-footer__all_rights_reserved>All Rights Reserved</span><span 
class=ms-2><a href=https://policies.google.com/privacy target=_blank 
rel=noopener>Privacy Policy</a></span></div></div></div></footer></div><script 
src=/js/main.min.26b35480299b932e285af8358c943de97509b95a0086d091584e7cb9b00c5c7b.js
 integrity="sha256-JrNUgCmbky4oWvg1jJQ96XUJuVoAhtCRWE58ubAMXHs=" 
crossorigin=anonymous></script><script defer src=/js/click-to [...]
\ No newline at end of file
diff --git a/output/docs/file-format/index.xml 
b/output/docs/file-format/index.xml
index f0bcb2f..e54dc92 100644
--- a/output/docs/file-format/index.xml
+++ b/output/docs/file-format/index.xml
@@ -19,8 +19,15 @@ at a time; this is not the IO chunk. We recommend 8KB for 
page sizes.&lt;/p></de
 &lt;li>Encodings: Encodings are specified by enum and more can be added in the 
future.&lt;/li>
 &lt;li>Page types: Additional page types can be added and safely 
skipped.&lt;/li>
 &lt;/ul></description></item><item><title>Docs: 
Metadata</title><link>/docs/file-format/metadata/</link><pubDate>Mon, 01 Jan 
0001 00:00:00 
+0000</pubDate><guid>/docs/file-format/metadata/</guid><description>
-&lt;p>There are three types of metadata: file metadata, column (chunk) 
metadata and page
-header metadata. All thrift structures are serialized using the 
TCompactProtocol.&lt;/p>
+&lt;p>There are two types of metadata: file metadata, and page header metadata.
+In the diagram below, file metadata is described by the 
&lt;code>FileMetaData&lt;/code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata 
(&lt;code>PageHeader&lt;/code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.&lt;/p>
+&lt;p>All thrift structures are serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+&lt;a 
href="https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift";>Thrift
 definition&lt;/a>.&lt;/p>
 &lt;p>&lt;img alt="File Layout" 
src="/images/FileFormat.gif">&lt;/p></description></item><item><title>Docs: 
Types</title><link>/docs/file-format/types/</link><pubDate>Mon, 01 Jan 0001 
00:00:00 +0000</pubDate><guid>/docs/file-format/types/</guid><description>
 &lt;p>The types supported by the file format are intended to be as minimal as 
possible,
 with a focus on how the types effect on disk storage. For example, 16-bit ints
diff --git a/output/docs/file-format/metadata/index.html 
b/output/docs/file-format/metadata/index.html
index 22cc8f5..5585109 100644
--- a/output/docs/file-format/metadata/index.html
+++ b/output/docs/file-format/metadata/index.html
@@ -1,9 +1,19 @@
 <!doctype html><html itemscope itemtype=http://schema.org/WebPage lang=en 
class=no-js><head><meta charset=utf-8><meta name=viewport 
content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots 
content="index, follow"><link rel="shortcut icon" 
href=/favicons/favicon.ico><link rel=apple-touch-icon 
href=/favicons/apple-touch-icon-180x180.png sizes=180x180><link rel=icon 
type=image/png href=/favicons/favicon-16x16.png sizes=16x16><link rel=icon 
type=image/png href=/favicon [...]
-<meta name=description content="There are three types of metadata: file 
metadata, column (chunk) metadata and page header metadata. All thrift 
structures are serialized using the TCompactProtocol."><meta 
property="og:title" content="Metadata"><meta property="og:description" 
content="There are three types of metadata: file metadata, column (chunk) 
metadata and page header metadata. All thrift structures are serialized using 
the TCompactProtocol."><meta property="og:type" content="article" [...]
+<meta name=description content="There are two types of metadata: file 
metadata, and page header metadata. In the diagram below, file metadata is 
described by the FileMetaData structure. This file metadata provides offset and 
size information useful when navigating the Parquet file. Page header metadata 
(PageHeader and children in the diagram) is stored in-line with the page data, 
and is used in the reading and decoding of said data.
+All thrift structures are serialized using the TCompactProtocol."><meta 
property="og:title" content="Metadata"><meta property="og:description" 
content="There are two types of metadata: file metadata, and page header 
metadata. In the diagram below, file metadata is described by the FileMetaData 
structure. This file metadata provides offset and size information useful when 
navigating the Parquet file. Page header metadata (PageHeader and children in 
the diagram) is stored in-line with the  [...]
+All thrift structures are serialized using the TCompactProtocol."><meta 
property="og:type" content="article"><meta property="og:url" 
content="/docs/file-format/metadata/"><meta property="article:section" 
content="docs"><meta property="article:modified_time" 
content="2024-07-07T19:25:32-07:00"><meta property="og:site_name" 
content="Apache Parquet"><meta itemprop=name content="Metadata"><meta 
itemprop=description content="There are two types of metadata: file metadata, 
and page header meta [...]
+All thrift structures are serialized using the TCompactProtocol."><meta 
itemprop=dateModified content="2024-07-07T19:25:32-07:00"><meta 
itemprop=wordCount content="86"><meta itemprop=keywords content><meta 
name=twitter:card content="summary"><meta name=twitter:title 
content="Metadata"><meta name=twitter:description content="There are two types 
of metadata: file metadata, and page header metadata. In the diagram below, 
file metadata is described by the FileMetaData structure. This file me [...]
+All thrift structures are serialized using the TCompactProtocol."><link 
rel=preload 
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
 as=style><link 
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
 rel=stylesheet integrity><script 
src=https://code.jquery.com/jquery-3.6.3.min.js 
integrity="sha512-STof4xm1wgkfm7heWqFJVn58Hm3EtS31XFaagaa8VMReCXAkQnJZ+jEy8PCC/iT18dFy95WcExNHFTqLyp72eQ=="
 crossorigin=anonymous></sc [...]
 <a 
href=https://github.com/apache/parquet-site/edit/production/content/en/docs/File%20Format/metadata.md
 class="td-page-meta--edit td-page-meta__edit" target=_blank rel=noopener><i 
class="fa-solid fa-pen-to-square fa-fw"></i> Edit this page</a>
 <a 
href="https://github.com/apache/parquet-site/new/production/content/en/docs/File%20Format?filename=change-me.md&amp;value=---%0Atitle%3A+%22Long+Page+Title%22%0AlinkTitle%3A+%22Short+Nav+Title%22%0Aweight%3A+100%0Adescription%3A+%3E-%0A+++++Page+description+for+heading+and+indexes.%0A---%0A%0A%23%23+Heading%0A%0AEdit+this+template+to+create+your+new+page.%0A%0A%2A+Give+it+a+good+name%2C+ending+in+%60.md%60+-+e.g.+%60getting-started.md%60%0A%2A+Edit+the+%22front+matter%22+section+at+th
 [...]
 <a href="https://github.com/apache/parquet-site/issues/new?title=Metadata"; 
class="td-page-meta--issue td-page-meta__issue" target=_blank rel=noopener><i 
class="fa-solid fa-list-check fa-fw"></i> Create documentation issue</a>
-<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print 
fa-fw"></i> Print entire section</a></div></aside><main class="col-12 col-md-9 
col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb class=td-breadcrumbs><ol 
class=breadcrumb><li class=breadcrumb-item><a 
href=/docs/>Documentation</a></li><li class=breadcrumb-item><a 
href=/docs/file-format/>File Format</a></li><li class="breadcrumb-item active" 
aria-current=page>Metadata</li></ol></nav><div class=td-content><h1>Metada [...]
-header metadata. All thrift structures are serialized using the 
TCompactProtocol.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p><div class=td-page-meta__lastmod>Last modified 
March 8, 2024: <a 
href=https://github.com/apache/parquet-site/commit/b3b81ce3e9f9e6f25b41f463577976628515384a>Update
 to new website (b3b81ce)</a></div></div></main></div></div><footer 
class="td-footer row d-print-none"><div class=container-fluid><div class="row 
mx-md-2"><div class="td-footer__left col- [...]
+<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print 
fa-fw"></i> Print entire section</a></div></aside><main class="col-12 col-md-9 
col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb class=td-breadcrumbs><ol 
class=breadcrumb><li class=breadcrumb-item><a 
href=/docs/>Documentation</a></li><li class=breadcrumb-item><a 
href=/docs/file-format/>File Format</a></li><li class="breadcrumb-item active" 
aria-current=page>Metadata</li></ol></nav><div class=td-content><h1>Metada [...]
+In the diagram below, file metadata is described by the 
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata 
(<code>PageHeader</code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.</p><p>All thrift structures are 
serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p><div class=td-page-meta__lastmod>Last modified 
July 7, 2024: <a 
href=https://github.com/apache/parquet-site/commit/a407d81a41a90b58ae90a6567a84dd084b5d2947>GH-68:
 Match language from parquet-format after merge of PARQUET-2139 (#69) 
(a407d81)</a></div></div></main></div></div><footer class="td-footer row 
d-print-none [...]
 2024
 <span class=td-footer__authors>Apache Parquet</span></span><span 
class=td-footer__all_rights_reserved>All Rights Reserved</span><span 
class=ms-2><a href=https://policies.google.com/privacy target=_blank 
rel=noopener>Privacy Policy</a></span></div></div></div></footer></div><script 
src=/js/main.min.26b35480299b932e285af8358c943de97509b95a0086d091584e7cb9b00c5c7b.js
 integrity="sha256-JrNUgCmbky4oWvg1jJQ96XUJuVoAhtCRWE58ubAMXHs=" 
crossorigin=anonymous></script><script defer src=/js/click-to [...]
\ No newline at end of file
diff --git a/output/docs/index.xml b/output/docs/index.xml
index 8971aa9..b03b3ff 100644
--- a/output/docs/index.xml
+++ b/output/docs/index.xml
@@ -1089,8 +1089,15 @@ example, strings are stored as byte arrays (binary) with 
a UTF8 annotation.
 These annotations define how to further decode and interpret the data.
 Annotations are stored as &lt;code>LogicalType&lt;/code> fields in the file 
metadata and are
 documented in LogicalTypes.md.&lt;/p></description></item><item><title>Docs: 
Metadata</title><link>/docs/file-format/metadata/</link><pubDate>Mon, 01 Jan 
0001 00:00:00 
+0000</pubDate><guid>/docs/file-format/metadata/</guid><description>
-&lt;p>There are three types of metadata: file metadata, column (chunk) 
metadata and page
-header metadata. All thrift structures are serialized using the 
TCompactProtocol.&lt;/p>
+&lt;p>There are two types of metadata: file metadata, and page header metadata.
+In the diagram below, file metadata is described by the 
&lt;code>FileMetaData&lt;/code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file. Page header metadata 
(&lt;code>PageHeader&lt;/code> and
+children in the diagram) is stored in-line with the page data, and is
+used in the reading and decoding of said data.&lt;/p>
+&lt;p>All thrift structures are serialized using the TCompactProtocol. The full
+definition of these structures is given in the Parquet
+&lt;a 
href="https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift";>Thrift
 definition&lt;/a>.&lt;/p>
 &lt;p>&lt;img alt="File Layout" 
src="/images/FileFormat.gif">&lt;/p></description></item><item><title>Docs: 
Nested 
Encoding</title><link>/docs/file-format/nestedencoding/</link><pubDate>Mon, 01 
Jan 0001 00:00:00 
+0000</pubDate><guid>/docs/file-format/nestedencoding/</guid><description>
 &lt;p>To encode nested columns, Parquet uses the Dremel encoding with 
definition and
 repetition levels. Definition levels specify how many optional fields in the
diff --git a/output/sitemap.xml b/output/sitemap.xml
index 4623773..616e81f 100644
--- a/output/sitemap.xml
+++ b/output/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"; 
xmlns:xhtml="http://www.w3.org/1999/xhtml";><url><loc>/docs/file-format/data-pages/compression/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encodings/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encryption/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/
 [...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"; 
xmlns:xhtml="http://www.w3.org/1999/xhtml";><url><loc>/docs/file-format/data-pages/compression/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encodings/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encryption/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/
 [...]
\ No newline at end of file

Reply via email to