This is an automated email from the ASF dual-hosted git repository.
github-bot pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/parquet-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7f51ef2 deploy: 5ab1cc62cee7214f3c1f17e6b8a3a22e721eb6de
7f51ef2 is described below
commit 7f51ef27cb8135b3b85c46e429d44ae65967ff5b
Author: gszadovszky <[email protected]>
AuthorDate: Thu Mar 6 07:20:17 2025 +0000
deploy: 5ab1cc62cee7214f3c1f17e6b8a3a22e721eb6de
---
output/docs/_print/index.html | 12 +-
output/docs/file-format/_print/index.html | 12 +-
output/docs/file-format/index.xml | 17 +--
output/docs/file-format/metadata/index.html | 30 +++--
output/docs/index.xml | 17 +--
output/images/FileLayoutBloomFilter2.png | Bin
output/images/FileLayoutEncryptionEF.png | Bin
output/images/FileLayoutEncryptionPF.png | Bin
output/images/FileMetaData.mermaid | 173 ++++++++++++++++++++++++++++
output/images/FileMetaData.svg | 1 +
output/images/PageHeader.mermaid | 62 ++++++++++
output/images/PageHeader.svg | 1 +
output/sitemap.xml | 2 +-
13 files changed, 286 insertions(+), 41 deletions(-)
diff --git a/output/docs/_print/index.html b/output/docs/_print/index.html
index 96f3d11..5c208a5 100644
--- a/output/docs/_print/index.html
+++ b/output/docs/_print/index.html
@@ -44,14 +44,12 @@ per HDFS file.</p><h3 id=data-page--size>Data Page
Size</h3><p>Data pages should
allow for more fine grained reading (e.g. single row lookup). Larger page sizes
incur less space overhead (less page headers) and potentially less parsing
overhead
(processing headers). Note: for sequential scans, it is not expected to read a
page
-at a time; this is not the IO chunk. We recommend 8KB for page
sizes.</p></div><div class=td-content style=page-break-before:always><h1
id=pg-6762d78210357c1df172dfddcc6fd307>3.2 - Extensibility</h1><p>There are
many places in the format for compatible extensions:</p><ul><li>File Version:
The file metadata contains a version.</li><li>Encodings: Encodings are
specified by enum and more can be added in the future.</li><li>Page types:
Additional page types can be added and safely skipped.</ [...]
-In the diagram below, file metadata is described by the
<code>FileMetaData</code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.</p><p>All thrift structures are
serialized using the TCompactProtocol. The full
+at a time; this is not the IO chunk. We recommend 8KB for page
sizes.</p></div><div class=td-content style=page-break-before:always><h1
id=pg-6762d78210357c1df172dfddcc6fd307>3.2 - Extensibility</h1><p>There are
many places in the format for compatible extensions:</p><ul><li>File Version:
The file metadata contains a version.</li><li>Encodings: Encodings are
specified by enum and more can be added in the future.</li><li>Page types:
Additional page types can be added and safely skipped.</ [...]
definition of these structures is given in the Parquet
-<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>3.4 -
Types</h1><p>The types supported by the file format are intended to be as
minimal as possible,
+<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><h2 id=file-metadata>File metadata</h2><p>In the diagram
below, file metadata is described by the <code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.</p><p><img alt="Parquet Metadata format"
src=/images/FileMetaData.svg></p><h2 id=page-header>Page header</h2><p>Page
header metadata (<code>PageHeader</code> and children in the diagram) is stored
+in-line with the page data, and is used in the reading and decoding of
data.</p><p><img alt="Parquet PageHeader format"
src=/images/PageHeader.svg></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>3.4 -
Types</h1><p>The types supported by the file format are intended to be as
minimal as possible,
with a focus on how the types effect on disk storage. For example, 16-bit ints
are not explicitly supported in the storage format since they are covered by
32-bit ints with an efficient encoding. This reduces the complexity of
implementing
diff --git a/output/docs/file-format/_print/index.html
b/output/docs/file-format/_print/index.html
index 5c98367..d4252c0 100644
--- a/output/docs/file-format/_print/index.html
+++ b/output/docs/file-format/_print/index.html
@@ -37,14 +37,12 @@ per HDFS file.</p><h3 id=data-page--size>Data Page
Size</h3><p>Data pages should
allow for more fine grained reading (e.g. single row lookup). Larger page sizes
incur less space overhead (less page headers) and potentially less parsing
overhead
(processing headers). Note: for sequential scans, it is not expected to read a
page
-at a time; this is not the IO chunk. We recommend 8KB for page
sizes.</p></div><div class=td-content style=page-break-before:always><h1
id=pg-6762d78210357c1df172dfddcc6fd307>2 - Extensibility</h1><p>There are many
places in the format for compatible extensions:</p><ul><li>File Version: The
file metadata contains a version.</li><li>Encodings: Encodings are specified by
enum and more can be added in the future.</li><li>Page types: Additional page
types can be added and safely skipped.</li [...]
-In the diagram below, file metadata is described by the
<code>FileMetaData</code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.</p><p>All thrift structures are
serialized using the TCompactProtocol. The full
+at a time; this is not the IO chunk. We recommend 8KB for page
sizes.</p></div><div class=td-content style=page-break-before:always><h1
id=pg-6762d78210357c1df172dfddcc6fd307>2 - Extensibility</h1><p>There are many
places in the format for compatible extensions:</p><ul><li>File Version: The
file metadata contains a version.</li><li>Encodings: Encodings are specified by
enum and more can be added in the future.</li><li>Page types: Additional page
types can be added and safely skipped.</li [...]
definition of these structures is given in the Parquet
-<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>4 -
Types</h1><p>The types supported by the file format are intended to be as
minimal as possible,
+<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><h2 id=file-metadata>File metadata</h2><p>In the diagram
below, file metadata is described by the <code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.</p><p><img alt="Parquet Metadata format"
src=/images/FileMetaData.svg></p><h2 id=page-header>Page header</h2><p>Page
header metadata (<code>PageHeader</code> and children in the diagram) is stored
+in-line with the page data, and is used in the reading and decoding of
data.</p><p><img alt="Parquet PageHeader format"
src=/images/PageHeader.svg></p></div><div class=td-content
style=page-break-before:always><h1 id=pg-140f2d6d5609da0af9dcaa92233086b5>4 -
Types</h1><p>The types supported by the file format are intended to be as
minimal as possible,
with a focus on how the types effect on disk storage. For example, 16-bit ints
are not explicitly supported in the storage format since they are covered by
32-bit ints with an efficient encoding. This reduces the complexity of
implementing
diff --git a/output/docs/file-format/index.xml
b/output/docs/file-format/index.xml
index 50e2209..6e04883 100644
--- a/output/docs/file-format/index.xml
+++ b/output/docs/file-format/index.xml
@@ -19,16 +19,19 @@ at a time; this is not the IO chunk. We recommend 8KB for
page sizes.</p></de
<li>Encodings: Encodings are specified by enum and more can be added in the
future.</li>
<li>Page types: Additional page types can be added and safely
skipped.</li>
</ul></description></item><item><title>Docs:
Metadata</title><link>/docs/file-format/metadata/</link><pubDate>Mon, 01 Jan
0001 00:00:00
+0000</pubDate><guid>/docs/file-format/metadata/</guid><description>
-<p>There are two types of metadata: file metadata, and page header metadata.
-In the diagram below, file metadata is described by the
<code>FileMetaData</code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.</p>
+<p>There are two types of metadata: file metadata, and page header
metadata.</p>
<p>All thrift structures are serialized using the TCompactProtocol. The full
definition of these structures is given in the Parquet
<a
href="https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift">Thrift
definition</a>.</p>
-<p><img alt="File Layout"
src="/images/FileFormat.gif"></p></description></item><item><title>Docs:
Types</title><link>/docs/file-format/types/</link><pubDate>Mon, 01 Jan 0001
00:00:00 +0000</pubDate><guid>/docs/file-format/types/</guid><description>
+<h2 id="file-metadata">File metadata</h2>
+<p>In the diagram below, file metadata is described by the
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.</p>
+<p><img alt="Parquet Metadata format"
src="/images/FileMetaData.svg"></p>
+<h2 id="page-header">Page header</h2>
+<p>Page header metadata (<code>PageHeader</code> and children in the
diagram) is stored
+in-line with the page data, and is used in the reading and decoding of
data.</p>
+<p><img alt="Parquet PageHeader format"
src="/images/PageHeader.svg"></p></description></item><item><title>Docs:
Types</title><link>/docs/file-format/types/</link><pubDate>Mon, 01 Jan 0001
00:00:00 +0000</pubDate><guid>/docs/file-format/types/</guid><description>
<p>The types supported by the file format are intended to be as minimal as
possible,
with a focus on how the types effect on disk storage. For example, 16-bit ints
are not explicitly supported in the storage format since they are covered by
diff --git a/output/docs/file-format/metadata/index.html
b/output/docs/file-format/metadata/index.html
index fb0fa1f..3bf71da 100644
--- a/output/docs/file-format/metadata/index.html
+++ b/output/docs/file-format/metadata/index.html
@@ -1,19 +1,25 @@
<!doctype html><html itemscope itemtype=http://schema.org/WebPage lang=en
class=no-js><head><meta charset=utf-8><meta name=viewport
content="width=device-width,initial-scale=1,shrink-to-fit=no"><meta name=robots
content="index, follow"><link rel="shortcut icon"
href=/favicons/favicon.ico><link rel=apple-touch-icon
href=/favicons/apple-touch-icon-180x180.png sizes=180x180><link rel=icon
type=image/png href=/favicons/favicon-16x16.png sizes=16x16><link rel=icon
type=image/png href=/favicon [...]
-<meta name=description content="There are two types of metadata: file
metadata, and page header metadata. In the diagram below, file metadata is
described by the FileMetaData structure. This file metadata provides offset and
size information useful when navigating the Parquet file. Page header metadata
(PageHeader and children in the diagram) is stored in-line with the page data,
and is used in the reading and decoding of said data.
-All thrift structures are serialized using the TCompactProtocol."><meta
property="og:title" content="Metadata"><meta property="og:description"
content="There are two types of metadata: file metadata, and page header
metadata. In the diagram below, file metadata is described by the FileMetaData
structure. This file metadata provides offset and size information useful when
navigating the Parquet file. Page header metadata (PageHeader and children in
the diagram) is stored in-line with the [...]
-All thrift structures are serialized using the TCompactProtocol."><meta
property="og:type" content="article"><meta property="og:url"
content="/docs/file-format/metadata/"><meta property="article:section"
content="docs"><meta property="article:modified_time"
content="2024-07-07T19:25:32-07:00"><meta property="og:site_name"
content="Apache Parquet"><meta itemprop=name content="Metadata"><meta
itemprop=description content="There are two types of metadata: file metadata,
and page header meta [...]
-All thrift structures are serialized using the TCompactProtocol."><meta
itemprop=dateModified content="2024-07-07T19:25:32-07:00"><meta
itemprop=wordCount content="86"><meta itemprop=keywords content><meta
name=twitter:card content="summary"><meta name=twitter:title
content="Metadata"><meta name=twitter:description content="There are two types
of metadata: file metadata, and page header metadata. In the diagram below,
file metadata is described by the FileMetaData structure. This file me [...]
-All thrift structures are serialized using the TCompactProtocol."><link
rel=preload
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
as=style><link
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
rel=stylesheet integrity><script
src=https://code.jquery.com/jquery-3.6.3.min.js
integrity="sha512-STof4xm1wgkfm7heWqFJVn58Hm3EtS31XFaagaa8VMReCXAkQnJZ+jEy8PCC/iT18dFy95WcExNHFTqLyp72eQ=="
crossorigin=anonymous></sc [...]
+<meta name=description content="There are two types of metadata: file
metadata, and page header metadata.
+All thrift structures are serialized using the TCompactProtocol. The full
definition of these structures is given in the Parquet Thrift definition.
+File metadata In the diagram below, file metadata is described by the
FileMetaData structure. This file metadata provides offset and size information
useful when navigating the Parquet file.
+Page header Page header metadata (PageHeader and children in the diagram) is
stored in-line with the page data, and is used in the reading and decoding of
data."><meta property="og:title" content="Metadata"><meta
property="og:description" content="There are two types of metadata: file
metadata, and page header metadata.
+All thrift structures are serialized using the TCompactProtocol. The full
definition of these structures is given in the Parquet Thrift definition.
+File metadata In the diagram below, file metadata is described by the
FileMetaData structure. This file metadata provides offset and size information
useful when navigating the Parquet file.
+Page header Page header metadata (PageHeader and children in the diagram) is
stored in-line with the page data, and is used in the reading and decoding of
data."><meta property="og:type" content="article"><meta property="og:url"
content="/docs/file-format/metadata/"><meta property="article:section"
content="docs"><meta property="article:modified_time"
content="2025-03-05T23:19:07-08:00"><meta property="og:site_name"
content="Apache Parquet"><meta itemprop=name content="Metadata"><meta it [...]
+All thrift structures are serialized using the TCompactProtocol. The full
definition of these structures is given in the Parquet Thrift definition.
+File metadata In the diagram below, file metadata is described by the
FileMetaData structure. This file metadata provides offset and size information
useful when navigating the Parquet file.
+Page header Page header metadata (PageHeader and children in the diagram) is
stored in-line with the page data, and is used in the reading and decoding of
data."><meta itemprop=dateModified content="2025-03-05T23:19:07-08:00"><meta
itemprop=wordCount content="89"><meta itemprop=keywords content><meta
name=twitter:card content="summary"><meta name=twitter:title
content="Metadata"><meta name=twitter:description content="There are two types
of metadata: file metadata, and page header metadata.
+All thrift structures are serialized using the TCompactProtocol. The full
definition of these structures is given in the Parquet Thrift definition.
+File metadata In the diagram below, file metadata is described by the
FileMetaData structure. This file metadata provides offset and size information
useful when navigating the Parquet file.
+Page header Page header metadata (PageHeader and children in the diagram) is
stored in-line with the page data, and is used in the reading and decoding of
data."><link rel=preload
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
as=style><link
href=/scss/main.min.7f095589dcf99af199766009c9a9719e6fdf2b9ce9bcfbeb05e80eb6efeaf32b.css
rel=stylesheet integrity><script
src=https://code.jquery.com/jquery-3.6.3.min.js
integrity="sha512-STof4xm1wgkfm7heWqFJ [...]
<a
href=https://github.com/apache/parquet-site/edit/production/content/en/docs/File%20Format/metadata.md
class="td-page-meta--edit td-page-meta__edit" target=_blank rel=noopener><i
class="fa-solid fa-pen-to-square fa-fw"></i> Edit this page</a>
<a
href="https://github.com/apache/parquet-site/new/production/content/en/docs/File%20Format?filename=change-me.md&value=---%0Atitle%3A+%22Long+Page+Title%22%0AlinkTitle%3A+%22Short+Nav+Title%22%0Aweight%3A+100%0Adescription%3A+%3E-%0A+++++Page+description+for+heading+and+indexes.%0A---%0A%0A%23%23+Heading%0A%0AEdit+this+template+to+create+your+new+page.%0A%0A%2A+Give+it+a+good+name%2C+ending+in+%60.md%60+-+e.g.+%60getting-started.md%60%0A%2A+Edit+the+%22front+matter%22+section+at+th
[...]
<a href="https://github.com/apache/parquet-site/issues/new?title=Metadata"
class="td-page-meta--issue td-page-meta__issue" target=_blank rel=noopener><i
class="fa-solid fa-list-check fa-fw"></i> Create documentation issue</a>
-<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print
fa-fw"></i> Print entire section</a></div></aside><main class="col-12 col-md-9
col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb class=td-breadcrumbs><ol
class=breadcrumb><li class=breadcrumb-item><a
href=/docs/>Documentation</a></li><li class=breadcrumb-item><a
href=/docs/file-format/>File Format</a></li><li class="breadcrumb-item active"
aria-current=page>Metadata</li></ol></nav><div class=td-content><h1>Metada [...]
-In the diagram below, file metadata is described by the
<code>FileMetaData</code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.</p><p>All thrift structures are
serialized using the TCompactProtocol. The full
+<a id=print href=/docs/file-format/_print/><i class="fa-solid fa-print
fa-fw"></i> Print entire section</a></div><div class=td-toc><nav
id=TableOfContents><ul><li><a href=#file-metadata>File metadata</a></li><li><a
href=#page-header>Page header</a></li></ul></nav></div></aside><main
class="col-12 col-md-9 col-xl-8 ps-md-5" role=main><nav aria-label=breadcrumb
class=td-breadcrumbs><ol class=breadcrumb><li class=breadcrumb-item><a
href=/docs/>Documentation</a></li><li class=breadcrumb-item [...]
definition of these structures is given in the Parquet
-<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><p><img alt="File Layout"
src=/images/FileFormat.gif></p><div class=td-page-meta__lastmod>Last modified
July 7, 2024: <a
href=https://github.com/apache/parquet-site/commit/a407d81a41a90b58ae90a6567a84dd084b5d2947>GH-68:
Match language from parquet-format after merge of PARQUET-2139 (#69)
(a407d81)</a></div></div></main></div></div><footer class="td-footer row
d-print-none [...]
+<a
href=https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift>Thrift
definition</a>.</p><h2 id=file-metadata>File metadata</h2><p>In the diagram
below, file metadata is described by the <code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.</p><p><img alt="Parquet Metadata format"
src=/images/FileMetaData.svg></p><h2 id=page-header>Page header</h2><p>Page
header metadata (<code>PageHeader</code> and children in the diagram) is stored
+in-line with the page data, and is used in the reading and decoding of
data.</p><p><img alt="Parquet PageHeader format"
src=/images/PageHeader.svg></p><div class=td-page-meta__lastmod>Last modified
March 5, 2025: <a
href=https://github.com/apache/parquet-site/commit/5ab1cc62cee7214f3c1f17e6b8a3a22e721eb6de>Update
Metadata Diagrams (#106) (5ab1cc6)</a></div></div></main></div></div><footer
class="td-footer row d-print-none"><div class=container-fluid><div class="row
mx-md-2"><div class="t [...]
2025
<span class=td-footer__authors>Apache Parquet</span></span><span
class=td-footer__all_rights_reserved>All Rights Reserved</span><span
class=ms-2><a href=https://policies.google.com/privacy target=_blank
rel=noopener>Privacy Policy</a></span></div></div></div></footer></div><script
src=/js/main.min.26b35480299b932e285af8358c943de97509b95a0086d091584e7cb9b00c5c7b.js
integrity="sha256-JrNUgCmbky4oWvg1jJQ96XUJuVoAhtCRWE58ubAMXHs="
crossorigin=anonymous></script><script defer src=/js/click-to [...]
\ No newline at end of file
diff --git a/output/docs/index.xml b/output/docs/index.xml
index dafeb77..1258c7b 100644
--- a/output/docs/index.xml
+++ b/output/docs/index.xml
@@ -1069,16 +1069,19 @@ example, strings are stored as byte arrays (binary)
with a UTF8 annotation.
These annotations define how to further decode and interpret the data.
Annotations are stored as <code>LogicalType</code> fields in the file
metadata and are
documented in LogicalTypes.md.</p></description></item><item><title>Docs:
Metadata</title><link>/docs/file-format/metadata/</link><pubDate>Mon, 01 Jan
0001 00:00:00
+0000</pubDate><guid>/docs/file-format/metadata/</guid><description>
-<p>There are two types of metadata: file metadata, and page header metadata.
-In the diagram below, file metadata is described by the
<code>FileMetaData</code>
-structure. This file metadata provides offset and size information useful
-when navigating the Parquet file. Page header metadata
(<code>PageHeader</code> and
-children in the diagram) is stored in-line with the page data, and is
-used in the reading and decoding of said data.</p>
+<p>There are two types of metadata: file metadata, and page header
metadata.</p>
<p>All thrift structures are serialized using the TCompactProtocol. The full
definition of these structures is given in the Parquet
<a
href="https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift">Thrift
definition</a>.</p>
-<p><img alt="File Layout"
src="/images/FileFormat.gif"></p></description></item><item><title>Docs:
Nested
Encoding</title><link>/docs/file-format/nestedencoding/</link><pubDate>Mon, 01
Jan 0001 00:00:00
+0000</pubDate><guid>/docs/file-format/nestedencoding/</guid><description>
+<h2 id="file-metadata">File metadata</h2>
+<p>In the diagram below, file metadata is described by the
<code>FileMetaData</code>
+structure. This file metadata provides offset and size information useful
+when navigating the Parquet file.</p>
+<p><img alt="Parquet Metadata format"
src="/images/FileMetaData.svg"></p>
+<h2 id="page-header">Page header</h2>
+<p>Page header metadata (<code>PageHeader</code> and children in the
diagram) is stored
+in-line with the page data, and is used in the reading and decoding of
data.</p>
+<p><img alt="Parquet PageHeader format"
src="/images/PageHeader.svg"></p></description></item><item><title>Docs:
Nested
Encoding</title><link>/docs/file-format/nestedencoding/</link><pubDate>Mon, 01
Jan 0001 00:00:00
+0000</pubDate><guid>/docs/file-format/nestedencoding/</guid><description>
<p>To encode nested columns, Parquet uses the Dremel encoding with
definition and
repetition levels. Definition levels specify how many optional fields in the
path for the column are defined. Repetition levels specify at what repeated
field
diff --git a/output/images/FileLayoutBloomFilter2.png
b/output/images/FileLayoutBloomFilter2.png
old mode 100755
new mode 100644
diff --git a/output/images/FileLayoutEncryptionEF.png
b/output/images/FileLayoutEncryptionEF.png
old mode 100755
new mode 100644
diff --git a/output/images/FileLayoutEncryptionPF.png
b/output/images/FileLayoutEncryptionPF.png
old mode 100755
new mode 100644
diff --git a/output/images/FileMetaData.mermaid
b/output/images/FileMetaData.mermaid
new file mode 100644
index 0000000..be96680
--- /dev/null
+++ b/output/images/FileMetaData.mermaid
@@ -0,0 +1,173 @@
+classDiagram
+ FileMetaData --> SchemaElement
+ FileMetaData --> RowGroup
+
+ RowGroup --> ColumnChunk
+
+ ColumnChunk --> ColumnMetaData
+
+ ColumnMetaData --> Statistics
+ ColumnMetaData --> Type
+ ColumnMetaData --> Encoding
+ ColumnMetaData --> CompressionCodec
+
+ SchemaElement --> LogicalTypes
+ SchemaElement --> Type
+ SchemaElement --> ConvertedType
+
+ class FileMetaData {
+ int32 version
+ list~SchemaElement~ schema
+ int64 num_rows
+ list~RowGroup~ row_groups
+ list~KeyValue~ key_value_metadata
+ string created_by
+ list~ColumnOrder~ column_orders
+ EncryptionAlgorithm encryption_algorithm
+ binary footer_signing_key_metadata
+ }
+
+ class SchemaElement {
+ Type type
+ int32 type_length
+ FieldRepetitionType repetition_type
+ string name
+ int32 num_children
+ ConvertedType converted_type
+ int32 scale
+ int32 precision
+ int32 field_id
+ LogicalType logicalType
+ }
+
+ class Type {
+ BOOLEAN
+ INT32
+ INT64
+ INT96
+ FLOAT
+ DOUBLE
+ BYTE_ARRAY
+ FIXED_LEN_BYTE_ARRAY
+ }
+
+ class LogicalTypes {
+ StringType
+ MapType
+ ListType
+ EnumType
+ DecimalType
+ DateType
+ TimeType
+ TimestampType
+ IntType
+ NullType
+ JsonType
+ BsonType
+ UUIDType
+ Float16Type
+ VariantType
+ GeometryType
+ GeographyType
+ }
+
+ class ConvertedType {
+ UTF8
+ MAP
+ MAP_KEY_VALUE
+ LIST
+ ENUM
+ DECIMAL
+ DATE
+ TIME_MILLIS
+ TIME_MICROS
+ TIMESTAMP_MILLIS
+ TIMESTAMP_MICROS
+ UINT_8
+ UINT_16
+ UINT_32
+ UINT_64
+ INT_8
+ INT_16
+ INT_32
+ INT_64
+ JSON
+ BSON
+ INTERVAL
+ }
+
+ class Encoding {
+ PLAIN
+ PLAIN_DICTIONARY
+ RLE
+ BIT_PACKED
+ DELTA_BINARY_PACKED
+ DELTA_LENGTH_BYTE_ARRAY
+ DELTA_BYTE_ARRAY
+ RLE_DICTIONARY
+ BYTE_STREAM_SPLIT
+ }
+
+ class CompressionCodec {
+ UNCOMPRESSED
+ SNAPPY
+ GZIP
+ LZO
+ BROTLI
+ LZ4
+ ZSTD
+ LZ4_RAW
+ }
+
+ class RowGroup {
+ list~ColumnChunk~ columns
+ int64 total_byte_size
+ int64 num_rows
+ list~SortingColumn~ sorting_columns
+ int64 file_offset
+ int64 total_compressed_size
+ int16 ordinal
+ }
+
+ class ColumnChunk {
+ string file_path
+ int64 file_offset
+ ColumnMetaData meta_data
+ int64 offset_index_offset
+ int32 offset_index_length
+ int64 column_index_offset
+ int32 column_index_length
+ ColumnCryptoMetaData crypto_metadata
+ binary encrypted_column_metadata
+ }
+
+ class ColumnMetaData {
+ Type type
+ list~Encoding~ encodings
+ list~string~ path_in_schema
+ CompressionCodec codec
+ int64 num_values
+ int64 total_uncompressed_size
+ int64 total_compressed_size
+ list~KeyValue~ key_value_metadata
+ int64 data_page_offset
+ int64 index_page_offset
+ int64 dictionary_page_offset
+ Statistics statistics
+ list~PageEncodingStats~ encoding_stats
+ int64 bloom_filter_offset
+ int32 bloom_filter_length
+ SizeStatistics size_statistics
+ GeospatialStatistics geospatial_statistics
+ }
+
+ class Statistics {
+ binary max
+ binary min
+ int64 null_count
+ int64 distinct_count
+ binary max_value
+ binary min_value
+ bool is_max_value_exact
+ bool is_min_value_exact
+ }
diff --git a/output/images/FileMetaData.svg b/output/images/FileMetaData.svg
new file mode 100644
index 0000000..1b5ac94
--- /dev/null
+++ b/output/images/FileMetaData.svg
@@ -0,0 +1 @@
+<svg aria-roledescription="class" role="graphics-document document" viewBox="0
0 1592.3203125 2232" style="max-width: 1592.32px; background-color: white;"
class="classDiagram" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns="http://www.w3.org/2000/svg" width="100%"
id="my-svg"><style>#my-svg{font-family:"trebuchet
ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#my-svg
.error-icon{fill:#552222;}#my-svg
.error-text{fill:#552222;stroke:#552222;}#my-svg .edge-thickness-normal{stroke
[...]
\ No newline at end of file
diff --git a/output/images/PageHeader.mermaid b/output/images/PageHeader.mermaid
new file mode 100644
index 0000000..854eebd
--- /dev/null
+++ b/output/images/PageHeader.mermaid
@@ -0,0 +1,62 @@
+classDiagram
+ PageHeader --> PageType
+ PageHeader --> DictionaryPageHeader
+ PageHeader --> DataPageHeader
+ PageHeader --> DataPageHeaderV2
+
+ DataPageHeader --> Statistics
+ DataPageHeaderV2 --> Statistics
+
+ class PageHeader {
+ PageType type
+ int32 uncompressed_page_size
+ int32 compressed_page_size
+ int32 crc
+ DataPageHeader data_page_header
+ IndexPageHeader index_page_header
+ DictionaryPageHeader dictionary_page_header
+ DataPageHeaderV2 data_page_header_v2
+ }
+
+ class PageType {
+ DATA_PAGE = 0
+ INDEX_PAGE = 1
+ DICTIONARY_PAGE = 2
+ DATA_PAGE_V2 = 3
+ }
+
+ class DataPageHeader {
+ int32 num_values
+ Encoding encoding
+ Encoding definition_level_encoding
+ Encoding repetition_level_encoding
+ Statistics statistics
+ }
+
+ class DictionaryPageHeader {
+ int32 num_values
+ Encoding encoding
+ bool is_sorted
+ }
+
+ class DataPageHeaderV2 {
+ int32 num_values
+ int32 num_nulls
+ int32 num_rows
+ Encoding encoding
+ int32 definition_levels_byte_length
+ int32 repetition_levels_byte_length
+ bool is_compressed
+ Statistics statistics
+ }
+
+ class Statistics {
+ binary max
+ binary min
+ int64 null_count
+ int64 distinct_count
+ binary max_value
+ binary min_value
+ bool is_max_value_exact
+ bool is_min_value_exact
+ }
diff --git a/output/images/PageHeader.svg b/output/images/PageHeader.svg
new file mode 100644
index 0000000..e0dad4c
--- /dev/null
+++ b/output/images/PageHeader.svg
@@ -0,0 +1 @@
+<svg aria-roledescription="class" role="graphics-document document" viewBox="0
0 1313.5 980" style="max-width: 1313.5px; background-color: white;"
class="classDiagram" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns="http://www.w3.org/2000/svg" width="100%"
id="my-svg"><style>#my-svg{font-family:"trebuchet
ms",verdana,arial,sans-serif;font-size:16px;fill:#333;}#my-svg
.error-icon{fill:#552222;}#my-svg
.error-text{fill:#552222;stroke:#552222;}#my-svg
.edge-thickness-normal{stroke-width:1 [...]
\ No newline at end of file
diff --git a/output/sitemap.xml b/output/sitemap.xml
index 3476b19..f4b1192 100644
--- a/output/sitemap.xml
+++ b/output/sitemap.xml
@@ -1 +1 @@
-<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/docs/file-format/data-pages/compression/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encodings/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encryption/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/
[...]
\ No newline at end of file
+<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/docs/file-format/data-pages/compression/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encodings/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/file-format/data-pages/encryption/</loc><lastmod>2024-03-11T22:11:10+01:00</lastmod></url><url><loc>/docs/
[...]
\ No newline at end of file