(parquet-site) branch asf-site updated: deploy: 5ab1cc62cee7214f3c1f17e6b8a3a22e721eb6de

github-bot Sat, 08 Mar 2025 11:11:36 -0800

This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/parquet-site.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 7f51ef2  deploy: 5ab1cc62cee7214f3c1f17e6b8a3a22e721eb6de
7f51ef2 is described below

commit 7f51ef27cb8135b3b85c46e429d44ae65967ff5b
Author: gszadovszky <[email protected]>
AuthorDate: Thu Mar 6 07:20:17 2025 +0000

    deploy: 5ab1cc62cee7214f3c1f17e6b8a3a22e721eb6de
---
 output/docs/_print/index.html               |  12 +-
 output/docs/file-format/_print/index.html   |  12 +-
 output/docs/file-format/index.xml           |  17 +--
 output/docs/file-format/metadata/index.html |  30 +++--
 output/docs/index.xml                       |  17 +--
 output/images/FileLayoutBloomFilter2.png    | Bin
 output/images/FileLayoutEncryptionEF.png    | Bin
 output/images/FileLayoutEncryptionPF.png    | Bin
 output/images/FileMetaData.mermaid          | 173 ++++++++++++++++++++++++++++
 output/images/FileMetaData.svg              |   1 +
 output/images/PageHeader.mermaid            |  62 ++++++++++
 output/images/PageHeader.svg                |   1 +
 output/sitemap.xml                          |   2 +-
 13 files changed, 286 insertions(+), 41 deletions(-)

diff --git a/output/docs/_print/index.html b/output/docs/_print/index.html
index 96f3d11..5c208a5 100644
--- a/output/docs/_print/index.html
+++ b/output/docs/_print/index.html
@@ -44,14 +44,12 @@ per HDFS file.</p><h3 id=data-page--size>Data Page 
Size</h3><p>Data pages should
 allow for more fine grained reading (e.g. single row lookup). Larger page sizes
 incur less space overhead (less page headers) and potentially less parsing 
overhead
 (processing headers). Note: for sequential scans, it is not expected to read a 
page
-at a time; this is not the IO chunk. We recommend 8KB for page 
sizes.</p></div><div class=td-content style=page-break-before:always><h1 
id=pg-6762d78210357c1df172dfddcc6fd307>3.2 - Extensibility</h1><p>There are 
many places in the format for compatible extensions:</p><ul><li>File Version: 
The file metadata contains a version.</li><li>Encodings: Encodings are 
specified by enum and more can be added in the future.</li><li>Page types: 
Additional page types can be added and safely skipped.</ [...]
-In the diagram below, file metadata is described by the 
<code>FileMetaData</code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata 
(<code>PageHeader</code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.</p><p>All thrift structures are 
serialized using the TCompactProtocol. The full
+at a time; this is not the IO chunk. We recommend 8KB for page 
sizes.</p></div><div class=td-content style=page-break-before:always><h1 
id=pg-6762d78210357c1df172dfddcc6fd307>3.2 - Extensibility</h1><p>There are 
many places in the format for compatible extensions:</p><ul><li>File Version: 
The file metadata contains a version.</li><li>Encodings: Encodings are 
specified by enum and more can be added in the future.</li><li>Page types: 
Additional page types can be added and safely skipped.</ [...]
 definition of these structures is given in the Parquet
-<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>3.4 - 
Types</h1><p>The types supported by the file format are intended to be as 
minimal as possible,
+<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><h2 id=file-metadata>File metadata</h2><p>In the diagram 
below, file metadata is described by the <code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.</p><p><img alt="Parquet Metadata format" 
src=/images/FileMetaData.svg></p><h2 id=page-header>Page header</h2><p>Page 
header metadata (<code>PageHeader</code> and children in the diagram) is stored
+in-line with the page data, and is used in the reading and decoding of 
data.</p><p><img alt="Parquet PageHeader format" 
src=/images/PageHeader.svg></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>3.4 - 
Types</h1><p>The types supported by the file format are intended to be as 
minimal as possible,
 with a focus on how the types effect on disk storage. For example, 16-bit ints
 are not explicitly supported in the storage format since they are covered by
 32-bit ints with an efficient encoding. This reduces the complexity of 
implementing
diff --git a/output/docs/file-format/_print/index.html 
b/output/docs/file-format/_print/index.html
index 5c98367..d4252c0 100644
--- a/output/docs/file-format/_print/index.html
+++ b/output/docs/file-format/_print/index.html
@@ -37,14 +37,12 @@ per HDFS file.</p><h3 id=data-page--size>Data Page 
Size</h3><p>Data pages should
 allow for more fine grained reading (e.g. single row lookup). Larger page sizes
 incur less space overhead (less page headers) and potentially less parsing 
overhead
 (processing headers). Note: for sequential scans, it is not expected to read a 
page
-at a time; this is not the IO chunk. We recommend 8KB for page 
sizes.</p></div><div class=td-content style=page-break-before:always><h1 
id=pg-6762d78210357c1df172dfddcc6fd307>2 - Extensibility</h1><p>There are many 
places in the format for compatible extensions:</p><ul><li>File Version: The 
file metadata contains a version.</li><li>Encodings: Encodings are specified by 
enum and more can be added in the future.</li><li>Page types: Additional page 
types can be added and safely skipped.</li [...]
-In the diagram below, file metadata is described by the 
<code>FileMetaData</code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata 
(<code>PageHeader</code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.</p><p>All thrift structures are 
serialized using the TCompactProtocol. The full
+at a time; this is not the IO chunk. We recommend 8KB for page 
sizes.</p></div><div class=td-content style=page-break-before:always><h1 
id=pg-6762d78210357c1df172dfddcc6fd307>2 - Extensibility</h1><p>There are many 
places in the format for compatible extensions:</p><ul><li>File Version: The 
file metadata contains a version.</li><li>Encodings: Encodings are specified by 
enum and more can be added in the future.</li><li>Page types: Additional page 
types can be added and safely skipped.</li [...]
 definition of these structures is given in the Parquet
-<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>4 - 
Types</h1><p>The types supported by the file format are intended to be as 
minimal as possible,
+<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><h2 id=file-metadata>File metadata</h2><p>In the diagram 
below, file metadata is described by the <code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.</p><p><img alt="Parquet Metadata format" 
src=/images/FileMetaData.svg></p><h2 id=page-header>Page header</h2><p>Page 
header metadata (<code>PageHeader</code> and children in the diagram) is stored
+in-line with the page data, and is used in the reading and decoding of 
data.</p><p><img alt="Parquet PageHeader format" 
src=/images/PageHeader.svg></p></div><div class=td-content 
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>4 - 
Types</h1><p>The types supported by the file format are intended to be as 
minimal as possible,
 with a focus on how the types effect on disk storage. For example, 16-bit ints
 are not explicitly supported in the storage format since they are covered by
 32-bit ints with an efficient encoding. This reduces the complexity of 
implementing
diff --git a/output/docs/file-format/index.xml 
b/output/docs/file-format/index.xml
index 50e2209..6e04883 100644
--- a/output/docs/file-format/index.xml
+++ b/output/docs/file-format/index.xml
@@ -19,16 +19,19 @@ at a time; this is not the IO chunk. We recommend 8KB for 
page sizes.&lt;/p></de
 &lt;li>Encodings: Encodings are specified by enum and more can be added in the 
future.&lt;/li>
 &lt;li>Page types: Additional page types can be added and safely 
skipped.&lt;/li>
 &lt;/ul></description></item><item><title>Docs: 
Metadata</title><link>/docs/file-format/metadata/</link><pubDate>Mon, 01 Jan 
0001 00:00:00 
+0000</pubDate><guid>/docs/file-format/metadata/</guid><description>
-&lt;p>There are two types of metadata: file metadata, and page header metadata.
-In the diagram below, file metadata is described by the 
&lt;code>FileMetaData&lt;/code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata 
(&lt;code>PageHeader&lt;/code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.&lt;/p>
+&lt;p>There are two types of metadata: file metadata, and page header 
metadata.&lt;/p>
 &lt;p>All thrift structures are serialized using the TCompactProtocol. The full
 definition of these structures is given in the Parquet
 &lt;a 
href="https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift";>Thrift
 definition&lt;/a>.&lt;/p>
-&lt;p>&lt;img alt="File Layout" 
src="/images/FileFormat.gif">&lt;/p></description></item><item><title>Docs: 
Types</title><link>/docs/file-format/types/</link><pubDate>Mon, 01 Jan 0001 
00:00:00 +0000</pubDate><guid>/docs/file-format/types/</guid><description>
+&lt;h2 id="file-metadata">File metadata&lt;/h2>
+&lt;p>In the diagram below, file metadata is described by the 
&lt;code>FileMetaData&lt;/code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.&lt;/p>
+&lt;p>&lt;img alt="Parquet Metadata format" 
src="/images/FileMetaData.svg">&lt;/p>
+&lt;h2 id="page-header">Page header&lt;/h2>
+&lt;p>Page header metadata (&lt;code>PageHeader&lt;/code> and children in the 
diagram) is stored
+in-line with the page data, and is used in the reading and decoding of 
data.&lt;/p>
+&lt;p>&lt;img alt="Parquet PageHeader format" 
src="/images/PageHeader.svg">&lt;/p></description></item><item><title>Docs: 
Types</title><link>/docs/file-format/types/</link><pubDate>Mon, 01 Jan 0001 
00:00:00 +0000</pubDate><guid>/docs/file-format/types/</guid><description>
 &lt;p>The types supported by the file format are intended to be as minimal as 
possible,
 with a focus on how the types effect on disk storage. For example, 16-bit ints
 are not explicitly supported in the storage format since they are covered by
diff --git a/output/docs/file-format/metadata/index.html 
b/output/docs/file-format/metadata/index.html
index fb0fa1f..3bf71da 100644
--- a/output/docs/file-format/metadata/index.html
+++ b/output/docs/file-format/metadata/index.html
@@ -1,19 +1,25 @@
 <!doctype html><html itemscope itemtype=http://schema.org/WebPage lang=en 
class=no-js><head><meta charset=utf-8><meta name=viewport 
content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots 
content="index, follow"><link rel="shortcut icon" 
href=/favicons/favicon.ico><link rel=apple-touch-icon 
href=/favicons/apple-touch-icon-180x180.png sizes=180x180><link rel=icon 
type=image/png href=/favicons/favicon-16x16.png sizes=16x16><link rel=icon 
type=image/png href=/favicon [...]
-<meta name=description content="There are two types of metadata: file 
metadata, and page header metadata. In the diagram below, file metadata is 
described by the FileMetaData structure. This file metadata provides offset and 
size information useful when navigating the Parquet file. Page header metadata 
(PageHeader and children in the diagram) is stored in-line with the page data, 
and is used in the reading and decoding of said data.
-All thrift structures are serialized using the TCompactProtocol."><meta 
property="og:title" content="Metadata"><meta property="og:description" 
content="There are two types of metadata: file metadata, and page header 
metadata. In the diagram below, file metadata is described by the FileMetaData 
structure. This file metadata provides offset and size information useful when 
navigating the Parquet file. Page header metadata (PageHeader and children in 
the diagram) is stored in-line with the  [...]
-All thrift structures are serialized using the TCompactProtocol."><meta 
property="og:type" content="article"><meta property="og:url" 
content="/docs/file-format/metadata/"><meta property="article:section" 
content="docs"><meta property="article:modified_time" 
content="2024-07-07T19:25:32-07:00"><meta property="og:site_name" 
content="Apache Parquet"><meta itemprop=name content="Metadata"><meta 
itemprop=description content="There are two types of metadata: file metadata, 
and page header meta [...]
-All thrift structures are serialized using the TCompactProtocol."><meta 
itemprop=dateModified content="2024-07-07T19:25:32-07:00"><meta 
itemprop=wordCount content="86"><meta itemprop=keywords content><meta 
name=twitter:card content="summary"><meta name=twitter:title 
content="Metadata"><meta name=twitter:description content="There are two types 
of metadata: file metadata, and page header metadata. In the diagram below, 
file metadata is described by the FileMetaData structure. This file me [...]
-All thrift structures are serialized using the TCompactProtocol."><link 
rel=preload 
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
 as=style><link 
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
 rel=stylesheet integrity><script 
src=https://code.jquery.com/jquery-3.6.3.min.js 
integrity="sha512-STof4xm1wgkfm7heWqFJVn58Hm3EtS31XFaagaa8VMReCXAkQnJZ+jEy8PCC/iT18dFy95WcExNHFTqLyp72eQ=="
 crossorigin=anonymous></sc [...]
+<meta name=description content="There are two types of metadata: file 
metadata, and page header metadata.
+All thrift structures are serialized using the TCompactProtocol. The full 
definition of these structures is given in the Parquet Thrift definition.
+File metadata In the diagram below, file metadata is described by the 
FileMetaData structure. This file metadata provides offset and size information 
useful when navigating the Parquet file.
+Page header Page header metadata (PageHeader and children in the diagram) is 
stored in-line with the page data, and is used in the reading and decoding of 
data."><meta property="og:title" content="Metadata"><meta 
property="og:description" content="There are two types of metadata: file 
metadata, and page header metadata.
+All thrift structures are serialized using the TCompactProtocol. The full 
definition of these structures is given in the Parquet Thrift definition.
+File metadata In the diagram below, file metadata is described by the 
FileMetaData structure. This file metadata provides offset and size information 
useful when navigating the Parquet file.
+Page header Page header metadata (PageHeader and children in the diagram) is 
stored in-line with the page data, and is used in the reading and decoding of 
data."><meta property="og:type" content="article"><meta property="og:url" 
content="/docs/file-format/metadata/"><meta property="article:section" 
content="docs"><meta property="article:modified_time" 
content="2025-03-05T23:19:07-08:00"><meta property="og:site_name" 
content="Apache Parquet"><meta itemprop=name content="Metadata"><meta it [...]
+All thrift structures are serialized using the TCompactProtocol. The full 
definition of these structures is given in the Parquet Thrift definition.
+File metadata In the diagram below, file metadata is described by the 
FileMetaData structure. This file metadata provides offset and size information 
useful when navigating the Parquet file.
+Page header Page header metadata (PageHeader and children in the diagram) is 
stored in-line with the page data, and is used in the reading and decoding of 
data."><meta itemprop=dateModified content="2025-03-05T23:19:07-08:00"><meta 
itemprop=wordCount content="89"><meta itemprop=keywords content><meta 
name=twitter:card content="summary"><meta name=twitter:title 
content="Metadata"><meta name=twitter:description content="There are two types 
of metadata: file metadata, and page header metadata.
+All thrift structures are serialized using the TCompactProtocol. The full 
definition of these structures is given in the Parquet Thrift definition.
+File metadata In the diagram below, file metadata is described by the 
FileMetaData structure. This file metadata provides offset and size information 
useful when navigating the Parquet file.
+Page header Page header metadata (PageHeader and children in the diagram) is 
stored in-line with the page data, and is used in the reading and decoding of 
data."><link rel=preload 
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
 as=style><link 
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
 rel=stylesheet integrity><script 
src=https://code.jquery.com/jquery-3.6.3.min.js 
integrity="sha512-STof4xm1wgkfm7heWqFJ [...]
 <a 
href=https://github.com/apache/parquet-site/edit/production/content/en/docs/File%20Format/metadata.md
 class="td-page-meta--edit td-page-meta__edit" target=_blank rel=noopener><i 
class="fa-solid fa-pen-to-square fa-fw"></i> Edit this page</a>
 <a 
href="https://github.com/apache/parquet-site/new/production/content/en/docs/File%20Format?filename=change-me.md&amp;value=---%0Atitle%3A+%22Long+Page+Title%22%0AlinkTitle%3A+%22Short+Nav+Title%22%0Aweight%3A+100%0Adescription%3A+%3E-%0A+++++Page+description+for+heading+and+indexes.%0A---%0A%0A%23%23+Heading%0A%0AEdit+this+template+to+create+your+new+page.%0A%0A%2A+Give+it+a+good+name%2C+ending+in+%60.md%60+-+e.g.+%60getting-started.md%60%0A%2A+Edit+the+%22front+matter%22+section+at+th
 [...]
 <a href="https://github.com/apache/parquet-site/issues/new?title=Metadata"; 
class="td-page-meta--issue td-page-meta__issue" target=_blank rel=noopener><i 
class="fa-solid fa-list-check fa-fw"></i> Create documentation issue</a>
-<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print 
fa-fw"></i> Print entire section</a></div></aside><main class="col-12 col-md-9 
col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb class=td-breadcrumbs><ol 
class=breadcrumb><li class=breadcrumb-item><a 
href=/docs/>Documentation</a></li><li class=breadcrumb-item><a 
href=/docs/file-format/>File Format</a></li><li class="breadcrumb-item active" 
aria-current=page>Metadata</li></ol></nav><div class=td-content><h1>Metada [...]
-In the diagram below, file metadata is described by the 
<code>FileMetaData</code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata 
(<code>PageHeader</code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.</p><p>All thrift structures are 
serialized using the TCompactProtocol. The full
+<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print 
fa-fw"></i> Print entire section</a></div><div class=td-toc><nav 
id=TableOfContents><ul><li><a href=#file-metadata>File metadata</a></li><li><a 
href=#page-header>Page header</a></li></ul></nav></div></aside><main 
class="col-12 col-md-9 col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb 
class=td-breadcrumbs><ol class=breadcrumb><li class=breadcrumb-item><a 
href=/docs/>Documentation</a></li><li class=breadcrumb-item [...]
 definition of these structures is given in the Parquet
-<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><p><img alt="File Layout" 
src=/images/FileFormat.gif></p><div class=td-page-meta__lastmod>Last modified 
July 7, 2024: <a 
href=https://github.com/apache/parquet-site/commit/a407d81a41a90b58ae90a6567a84dd084b5d2947>GH-68:
 Match language from parquet-format after merge of PARQUET-2139 (#69) 
(a407d81)</a></div></div></main></div></div><footer class="td-footer row 
d-print-none [...]
+<a 
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
 definition</a>.</p><h2 id=file-metadata>File metadata</h2><p>In the diagram 
below, file metadata is described by the <code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.</p><p><img alt="Parquet Metadata format" 
src=/images/FileMetaData.svg></p><h2 id=page-header>Page header</h2><p>Page 
header metadata (<code>PageHeader</code> and children in the diagram) is stored
+in-line with the page data, and is used in the reading and decoding of 
data.</p><p><img alt="Parquet PageHeader format" 
src=/images/PageHeader.svg></p><div class=td-page-meta__lastmod>Last modified 
March 5, 2025: <a 
href=https://github.com/apache/parquet-site/commit/5ab1cc62cee7214f3c1f17e6b8a3a22e721eb6de>Update
 Metadata Diagrams (#106) (5ab1cc6)</a></div></div></main></div></div><footer 
class="td-footer row d-print-none"><div class=container-fluid><div class="row 
mx-md-2"><div class="t [...]
 2025
 <span class=td-footer__authors>Apache Parquet</span></span><span 
class=td-footer__all_rights_reserved>All Rights Reserved</span><span 
class=ms-2><a href=https://policies.google.com/privacy target=_blank 
rel=noopener>Privacy Policy</a></span></div></div></div></footer></div><script 
src=/js/main.min.26b35480299b932e285af8358c943de97509b95a0086d091584e7cb9b00c5c7b.js
 integrity="sha256-JrNUgCmbky4oWvg1jJQ96XUJuVoAhtCRWE58ubAMXHs=" 
crossorigin=anonymous></script><script defer src=/js/click-to [...]
\ No newline at end of file
diff --git a/output/docs/index.xml b/output/docs/index.xml
index dafeb77..1258c7b 100644
--- a/output/docs/index.xml
+++ b/output/docs/index.xml
@@ -1069,16 +1069,19 @@ example, strings are stored as byte arrays (binary) 
with a UTF8 annotation.
 These annotations define how to further decode and interpret the data.
 Annotations are stored as &lt;code>LogicalType&lt;/code> fields in the file 
metadata and are
 documented in LogicalTypes.md.&lt;/p></description></item><item><title>Docs: 
Metadata</title><link>/docs/file-format/metadata/</link><pubDate>Mon, 01 Jan 
0001 00:00:00 
+0000</pubDate><guid>/docs/file-format/metadata/</guid><description>
-&lt;p>There are two types of metadata: file metadata, and page header metadata.
-In the diagram below, file metadata is described by the 
&lt;code>FileMetaData&lt;/code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata 
(&lt;code>PageHeader&lt;/code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.&lt;/p>
+&lt;p>There are two types of metadata: file metadata, and page header 
metadata.&lt;/p>
 &lt;p>All thrift structures are serialized using the TCompactProtocol. The full
 definition of these structures is given in the Parquet
 &lt;a 
href="https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift";>Thrift
 definition&lt;/a>.&lt;/p>
-&lt;p>&lt;img alt="File Layout" 
src="/images/FileFormat.gif">&lt;/p></description></item><item><title>Docs: 
Nested 
Encoding</title><link>/docs/file-format/nestedencoding/</link><pubDate>Mon, 01 
Jan 0001 00:00:00 
+0000</pubDate><guid>/docs/file-format/nestedencoding/</guid><description>
+&lt;h2 id="file-metadata">File metadata&lt;/h2>
+&lt;p>In the diagram below, file metadata is described by the 
&lt;code>FileMetaData&lt;/code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.&lt;/p>
+&lt;p>&lt;img alt="Parquet Metadata format" 
src="/images/FileMetaData.svg">&lt;/p>
+&lt;h2 id="page-header">Page header&lt;/h2>
+&lt;p>Page header metadata (&lt;code>PageHeader&lt;/code> and children in the 
diagram) is stored
+in-line with the page data, and is used in the reading and decoding of 
data.&lt;/p>
+&lt;p>&lt;img alt="Parquet PageHeader format" 
src="/images/PageHeader.svg">&lt;/p></description></item><item><title>Docs: 
Nested 
Encoding</title><link>/docs/file-format/nestedencoding/</link><pubDate>Mon, 01 
Jan 0001 00:00:00 
+0000</pubDate><guid>/docs/file-format/nestedencoding/</guid><description>
 &lt;p>To encode nested columns, Parquet uses the Dremel encoding with 
definition and
 repetition levels. Definition levels specify how many optional fields in the
 path for the column are defined. Repetition levels specify at what repeated 
field
diff --git a/output/images/FileLayoutBloomFilter2.png 
b/output/images/FileLayoutBloomFilter2.png
old mode 100755
new mode 100644
diff --git a/output/images/FileLayoutEncryptionEF.png 
b/output/images/FileLayoutEncryptionEF.png
old mode 100755
new mode 100644
diff --git a/output/images/FileLayoutEncryptionPF.png 
b/output/images/FileLayoutEncryptionPF.png
old mode 100755
new mode 100644
diff --git a/output/images/FileMetaData.mermaid 
b/output/images/FileMetaData.mermaid
new file mode 100644
index 0000000..be96680
--- /dev/null
+++ b/output/images/FileMetaData.mermaid
@@ -0,0 +1,173 @@
+classDiagram
+    FileMetaData --> SchemaElement
+    FileMetaData --> RowGroup
+    
+    RowGroup --> ColumnChunk
+    
+    ColumnChunk --> ColumnMetaData
+    
+    ColumnMetaData --> Statistics
+    ColumnMetaData --> Type
+    ColumnMetaData --> Encoding
+    ColumnMetaData --> CompressionCodec
+    
+    SchemaElement --> LogicalTypes
+    SchemaElement --> Type
+    SchemaElement --> ConvertedType
+    
+    class FileMetaData {
+        int32 version
+        list~SchemaElement~ schema
+        int64 num_rows
+        list~RowGroup~ row_groups
+        list~KeyValue~ key_value_metadata
+        string created_by
+        list~ColumnOrder~ column_orders
+        EncryptionAlgorithm encryption_algorithm
+        binary footer_signing_key_metadata
+    }
+    
+    class SchemaElement {
+        Type type
+        int32 type_length
+        FieldRepetitionType repetition_type
+        string name
+        int32 num_children
+        ConvertedType converted_type
+        int32 scale
+        int32 precision
+        int32 field_id
+        LogicalType logicalType
+    }
+    
+    class Type {
+        BOOLEAN
+        INT32
+        INT64
+        INT96
+        FLOAT
+        DOUBLE
+        BYTE_ARRAY
+        FIXED_LEN_BYTE_ARRAY
+    }
+    
+    class LogicalTypes {
+        StringType
+        MapType
+        ListType
+        EnumType
+        DecimalType
+        DateType
+        TimeType
+        TimestampType
+        IntType
+        NullType
+        JsonType
+        BsonType
+        UUIDType
+        Float16Type
+        VariantType
+        GeometryType
+        GeographyType
+    }
+    
+    class ConvertedType {
+        UTF8
+        MAP
+        MAP_KEY_VALUE
+        LIST
+        ENUM
+        DECIMAL
+        DATE
+        TIME_MILLIS
+        TIME_MICROS
+        TIMESTAMP_MILLIS
+        TIMESTAMP_MICROS
+        UINT_8
+        UINT_16
+        UINT_32
+        UINT_64
+        INT_8
+        INT_16
+        INT_32
+        INT_64
+        JSON
+        BSON
+        INTERVAL
+    }
+    
+    class Encoding {
+        PLAIN
+        PLAIN_DICTIONARY
+        RLE
+        BIT_PACKED
+        DELTA_BINARY_PACKED
+        DELTA_LENGTH_BYTE_ARRAY
+        DELTA_BYTE_ARRAY
+        RLE_DICTIONARY
+        BYTE_STREAM_SPLIT
+    }
+    
+    class CompressionCodec {
+        UNCOMPRESSED
+        SNAPPY
+        GZIP
+        LZO
+        BROTLI
+        LZ4
+        ZSTD
+        LZ4_RAW
+    }
+    
+    class RowGroup {
+        list~ColumnChunk~ columns
+        int64 total_byte_size
+        int64 num_rows
+        list~SortingColumn~ sorting_columns
+        int64 file_offset
+        int64 total_compressed_size
+        int16 ordinal
+    }
+    
+    class ColumnChunk {
+        string file_path
+        int64 file_offset
+        ColumnMetaData meta_data
+        int64 offset_index_offset
+        int32 offset_index_length
+        int64 column_index_offset
+        int32 column_index_length
+        ColumnCryptoMetaData crypto_metadata
+        binary encrypted_column_metadata
+    }
+    
+    class ColumnMetaData {
+        Type type
+        list~Encoding~ encodings
+        list~string~ path_in_schema
+        CompressionCodec codec
+        int64 num_values
+        int64 total_uncompressed_size
+        int64 total_compressed_size
+        list~KeyValue~ key_value_metadata
+        int64 data_page_offset
+        int64 index_page_offset
+        int64 dictionary_page_offset
+        Statistics statistics
+        list~PageEncodingStats~ encoding_stats
+        int64 bloom_filter_offset
+        int32 bloom_filter_length
+        SizeStatistics size_statistics
+        GeospatialStatistics geospatial_statistics
+    }
+    
+    class Statistics {
+        binary max
+        binary min
+        int64 null_count
+        int64 distinct_count
+        binary max_value
+        binary min_value
+        bool is_max_value_exact
+        bool is_min_value_exact
+    }
diff --git a/output/images/FileMetaData.svg b/output/images/FileMetaData.svg
new file mode 100644
index 0000000..1b5ac94
--- /dev/null
+++ b/output/images/FileMetaData.svg
@@ -0,0 +1 @@
+<svg aria-roledescription="class" role="graphics-document document" viewBox="0 
0 1592.3203125 2232" style="max-width: 1592.32px; background-color: white;" 
class="classDiagram" xmlns:xlink="http://www.w3.org/1999/xlink"; 
xmlns="http://www.w3.org/2000/svg"; width="100%" 
id="my-svg"><style>#my-svg{font-family:"trebuchet 
ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#my-svg 
.error-icon{fill:#552222;}#my-svg 
.error-text{fill:#552222;stroke:#552222;}#my-svg .edge-thickness-normal{stroke 
[...]
\ No newline at end of file
diff --git a/output/images/PageHeader.mermaid b/output/images/PageHeader.mermaid
new file mode 100644
index 0000000..854eebd
--- /dev/null
+++ b/output/images/PageHeader.mermaid
@@ -0,0 +1,62 @@
+classDiagram
+    PageHeader --> PageType
+    PageHeader --> DictionaryPageHeader
+    PageHeader --> DataPageHeader
+    PageHeader --> DataPageHeaderV2
+
+    DataPageHeader --> Statistics
+    DataPageHeaderV2 --> Statistics
+
+    class PageHeader {
+        PageType type
+        int32 uncompressed_page_size
+        int32 compressed_page_size
+        int32 crc
+        DataPageHeader data_page_header
+        IndexPageHeader index_page_header
+        DictionaryPageHeader dictionary_page_header
+        DataPageHeaderV2 data_page_header_v2
+    }
+
+    class PageType {
+        DATA_PAGE = 0
+        INDEX_PAGE = 1
+        DICTIONARY_PAGE = 2
+        DATA_PAGE_V2 = 3
+    }
+
+    class DataPageHeader {
+        int32 num_values
+        Encoding encoding
+        Encoding definition_level_encoding
+        Encoding repetition_level_encoding
+        Statistics statistics
+    }
+
+    class DictionaryPageHeader {
+        int32 num_values
+        Encoding encoding
+        bool is_sorted
+    }
+
+    class DataPageHeaderV2 {
+        int32 num_values
+        int32 num_nulls
+        int32 num_rows
+        Encoding encoding
+        int32 definition_levels_byte_length
+        int32 repetition_levels_byte_length
+        bool is_compressed
+        Statistics statistics
+    }
+
+    class Statistics {
+        binary max
+        binary min
+        int64 null_count
+        int64 distinct_count
+        binary max_value
+        binary min_value
+        bool is_max_value_exact
+        bool is_min_value_exact
+    }
diff --git a/output/images/PageHeader.svg b/output/images/PageHeader.svg
new file mode 100644
index 0000000..e0dad4c
--- /dev/null
+++ b/output/images/PageHeader.svg
@@ -0,0 +1 @@
+<svg aria-roledescription="class" role="graphics-document document" viewBox="0 
0 1313.5 980" style="max-width: 1313.5px; background-color: white;" 
class="classDiagram" xmlns:xlink="http://www.w3.org/1999/xlink"; 
xmlns="http://www.w3.org/2000/svg"; width="100%" 
id="my-svg"><style>#my-svg{font-family:"trebuchet 
ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#my-svg 
.error-icon{fill:#552222;}#my-svg 
.error-text{fill:#552222;stroke:#552222;}#my-svg 
.edge-thickness-normal{stroke-width:1 [...]
\ No newline at end of file
diff --git a/output/sitemap.xml b/output/sitemap.xml
index 3476b19..f4b1192 100644
--- a/output/sitemap.xml
+++ b/output/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"; 
xmlns:xhtml="http://www.w3.org/1999/xhtml";><url><loc>/docs/file-format/data-pages/compression/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encodings/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encryption/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/
 [...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"; 
xmlns:xhtml="http://www.w3.org/1999/xhtml";><url><loc>/docs/file-format/data-pages/compression/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encodings/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encryption/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/
 [...]
\ No newline at end of file

(parquet-site) branch asf-site updated: deploy: 5ab1cc62cee7214f3c1f17e6b8a3a22e721eb6de

Reply via email to