This is an automated email from the ASF dual-hosted git repository.
blue pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/iceberg.git
The following commit(s) were added to refs/heads/asf-site by this push:
new a885685 Deployed 25eaebacb with MkDocs version: 1.2.1
a885685 is described below
commit a8856855d4cd5cdf430bab323fe4c84fba89d9b4
Author: Ryan Blue <[email protected]>
AuthorDate: Sun Jul 11 17:03:48 2021 -0700
Deployed 25eaebacb with MkDocs version: 1.2.1
---
aws/index.html | 2 +-
benchmarks/index.html | 592 ++++++++++++++++++++++++++++++++++
community/index.html | 17 +-
evolution/index.html | 4 +-
flink/index.html | 2 +-
sitemap.xml | 75 +++--
sitemap.xml.gz | Bin 467 -> 473 bytes
spark-structured-streaming/index.html | 4 +-
spec/index.html | 166 +++++++++-
9 files changed, 804 insertions(+), 58 deletions(-)
diff --git a/aws/index.html b/aws/index.html
index af53ddc..d4ed104 100644
--- a/aws/index.html
+++ b/aws/index.html
@@ -568,7 +568,7 @@ an Iceberg table is stored as a <a
href="https://docs.aws.amazon.com/glue/latest
and every Iceberg table version is stored as a <a
href="https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html#aws-glue-api-catalog-tables-TableVersion">Glue
TableVersion</a>.
You can start using Glue catalog by specifying the <code>catalog-impl</code>
as <code>org.apache.iceberg.aws.glue.GlueCatalog</code>,
just like what is shown in the <a href="#enabling-aws-integration">enabling
AWS integration</a> section above.
-More details about loading the catalog can be found in individual engine
pages, such as <a href="../spark/#loading-a-custom-catalog">Spark</a> and <a
href="../flink/#creating-catalogs-and-using-catalogs">Flink</a>.</p>
+More details about loading the catalog can be found in individual engine
pages, such as <a
href="../spark-configuration/#loading-a-custom-catalog">Spark</a> and <a
href="../flink/#creating-catalogs-and-using-catalogs">Flink</a>.</p>
<h3 id="glue-catalog-id">Glue Catalog ID<a class="headerlink"
href="#glue-catalog-id" title="Permanent link">¶</a></h3>
<p>There is a unique Glue metastore in each AWS account and each AWS region.
By default, <code>GlueCatalog</code> chooses the Glue metastore to use based
on the user’s default AWS client credential and region setup.
diff --git a/benchmarks/index.html b/benchmarks/index.html
new file mode 100644
index 0000000..9a49cf4
--- /dev/null
+++ b/benchmarks/index.html
@@ -0,0 +1,592 @@
+<!DOCTYPE html>
+<html lang="en">
+
+<head>
+ <meta charset="utf-8">
+ <meta http-equiv="X-UA-Compatible" content="IE=edge">
+ <meta name="viewport" content="width=device-width, initial-scale=1.0">
+ <meta name="description" content="A table format for large, slow-moving
tabular data">
+
+ <link rel="canonical" href="https://iceberg.apache.org/benchmarks/">
+ <link rel="shortcut icon" href="../img/favicon.ico">
+
+
+ <title>Benchmarks - Apache Iceberg</title>
+
+
+ <link rel="stylesheet"
href="https://use.fontawesome.com/releases/v5.12.0/css/all.css">
+ <link rel="stylesheet"
href="https://use.fontawesome.com/releases/v5.12.0/css/v4-shims.css">
+ <link rel="stylesheet"
href="//cdn.jsdelivr.net/npm/[email protected]/build/web/hack.min.css">
+ <link href='//rsms.me/inter/inter.css' rel='stylesheet' type='text/css'>
+ <link
href='//fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,700italic,400,300,600,700&subset=latin-ext,latin'
rel='stylesheet' type='text/css'>
+ <link href="../css/bootstrap-custom.min.css" rel="stylesheet">
+ <link href="../css/base.min.css" rel="stylesheet">
+ <link href="../css/cinder.min.css" rel="stylesheet">
+
+
+
+ <link rel="stylesheet"
href="//cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/styles/github.min.css">
+
+
+ <link href="../css/extra.css" rel="stylesheet">
+
+ <!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media
queries -->
+ <!--[if lt IE 9]>
+ <script
src="https://cdn.jsdelivr.net/npm/[email protected]/dist/html5shiv.min.js"></script>
+ <script
src="https://cdn.jsdelivr.net/npm/[email protected]/dest/respond.min.js"></script>
+ <![endif]-->
+
+
+
+
+</head>
+
+<body>
+
+ <div class="navbar navbar-default navbar-fixed-top" role="navigation">
+ <div class="container">
+
+ <!-- Collapsed navigation -->
+ <div class="navbar-header">
+ <!-- Expander button -->
+ <button type="button" class="navbar-toggle" data-toggle="collapse"
data-target=".navbar-collapse">
+ <span class="sr-only">Toggle navigation</span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ <span class="icon-bar"></span>
+ </button>
+
+
+ <!-- Main title -->
+
+
+ <a class="navbar-brand" href="..">Apache Iceberg</a>
+
+ </div>
+
+ <!-- Expanded navigation -->
+ <div class="navbar-collapse collapse">
+ <!-- Main navigation -->
+ <ul class="nav navbar-nav">
+
+
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown">Project <b class="caret"></b></a>
+ <ul class="dropdown-menu">
+
+
+<li >
+ <a href="..">About</a>
+</li>
+
+
+
+<li >
+ <a href="../community/">Community</a>
+</li>
+
+
+
+<li >
+ <a href="../releases/">Releases</a>
+</li>
+
+
+
+<li >
+ <a href="../blogs/">Blogs</a>
+</li>
+
+
+
+<li >
+ <a href="../trademarks/">Trademarks</a>
+</li>
+
+
+
+<li >
+ <a href="../how-to-release/">How to Release</a>
+</li>
+
+
+ </ul>
+ </li>
+
+
+
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown">Tables <b class="caret"></b></a>
+ <ul class="dropdown-menu">
+
+
+<li >
+ <a href="../configuration/">Configuration</a>
+</li>
+
+
+
+<li >
+ <a href="../schemas/">Schemas</a>
+</li>
+
+
+
+<li >
+ <a href="../partitioning/">Partitioning</a>
+</li>
+
+
+
+<li >
+ <a href="../evolution/">Table evolution</a>
+</li>
+
+
+
+<li >
+ <a href="../maintenance/">Maintenance</a>
+</li>
+
+
+
+<li >
+ <a href="../performance/">Performance</a>
+</li>
+
+
+
+<li >
+ <a href="../reliability/">Reliability</a>
+</li>
+
+
+ </ul>
+ </li>
+
+
+
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown">Spark <b class="caret"></b></a>
+ <ul class="dropdown-menu">
+
+
+<li >
+ <a href="../getting-started/">Getting Started</a>
+</li>
+
+
+
+<li >
+ <a href="../spark-configuration/">Configuration</a>
+</li>
+
+
+
+<li >
+ <a href="../spark-ddl/">DDL</a>
+</li>
+
+
+
+<li >
+ <a href="../spark-queries/">Queries</a>
+</li>
+
+
+
+<li >
+ <a href="../spark-writes/">Writes</a>
+</li>
+
+
+
+<li >
+ <a href="../spark-procedures/">Maintenance Procedures</a>
+</li>
+
+
+
+<li >
+ <a href="../spark-structured-streaming/">Structured Streaming</a>
+</li>
+
+
+
+<li >
+ <a href="../spark-queries/#time-travel">Time Travel</a>
+</li>
+
+
+ </ul>
+ </li>
+
+
+
+ <li >
+ <a
href="https://trino.io/docs/current/connector/iceberg.html">Trino</a>
+ </li>
+
+
+
+ <li >
+ <a href="../flink/">Flink</a>
+ </li>
+
+
+
+ <li >
+ <a href="../hive/">Hive</a>
+ </li>
+
+
+
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown">Integrations <b class="caret"></b></a>
+ <ul class="dropdown-menu">
+
+
+<li >
+ <a href="../aws/">AWS</a>
+</li>
+
+
+
+<li >
+ <a href="../nessie/">Nessie</a>
+</li>
+
+
+ </ul>
+ </li>
+
+
+
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown">API <b class="caret"></b></a>
+ <ul class="dropdown-menu">
+
+
+<li >
+ <a href="/javadoc/">Javadoc</a>
+</li>
+
+
+
+<li >
+ <a href="../api/">Java API intro</a>
+</li>
+
+
+
+<li >
+ <a href="../java-api-quickstart/">Java Quickstart</a>
+</li>
+
+
+
+<li >
+ <a href="../custom-catalog/">Java Custom Catalog</a>
+</li>
+
+
+
+<li >
+ <a href="../python-quickstart/">Python Quickstart</a>
+</li>
+
+
+
+<li >
+ <a href="../python-api-intro/">Python API Intro</a>
+</li>
+
+
+
+<li >
+ <a href="../python-feature-support/">Python Feature Support</a>
+</li>
+
+
+ </ul>
+ </li>
+
+
+
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown">Format <b class="caret"></b></a>
+ <ul class="dropdown-menu">
+
+
+<li >
+ <a href="../terms/">Definitions</a>
+</li>
+
+
+
+<li >
+ <a href="../spec/">Spec</a>
+</li>
+
+
+ </ul>
+ </li>
+
+
+
+ <li >
+ <a href="https://github.com/apache/iceberg">GitHub</a>
+ </li>
+
+
+
+ <li class="dropdown">
+ <a href="#" class="dropdown-toggle"
data-toggle="dropdown">ASF <b class="caret"></b></a>
+ <ul class="dropdown-menu">
+
+
+<li >
+ <a href="https://www.apache.org/licenses/">License</a>
+</li>
+
+
+
+<li >
+ <a href="https://www.apache.org/security/">Security</a>
+</li>
+
+
+
+<li >
+ <a href="https://www.apache.org/foundation/thanks.html">Sponsors</a>
+</li>
+
+
+
+<li >
+ <a href="https://www.apache.org/foundation/sponsorship.html">Donate</a>
+</li>
+
+
+
+<li >
+ <a href="https://www.apache.org/events/current-event.html">Events</a>
+</li>
+
+
+ </ul>
+ </li>
+
+
+ </ul>
+
+ <ul class="nav navbar-nav navbar-right">
+ </ul>
+ </div>
+ </div>
+</div>
+
+ <div class="container">
+
+
+ <div class="col-md-3"><div class="bs-sidebar hidden-print affix well"
role="complementary">
+ <ul class="nav bs-sidenav">
+ <li class="first-level active"><a
href="#available-benchmarks-and-how-to-run-them">Available Benchmarks and how
to run them</a></li>
+ <li class="second-level"><a
href="#icebergsourcenestedlistparquetdatawritebenchmark">IcebergSourceNestedListParquetDataWriteBenchmark</a></li>
+
+ <li class="second-level"><a
href="#sparkparquetreadersnesteddatabenchmark">SparkParquetReadersNestedDataBenchmark</a></li>
+
+ <li class="second-level"><a
href="#sparkparquetwritersflatdatabenchmark">SparkParquetWritersFlatDataBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourceflatorcdatareadbenchmark">IcebergSourceFlatORCDataReadBenchmark</a></li>
+
+ <li class="second-level"><a
href="#sparkparquetreadersflatdatabenchmark">SparkParquetReadersFlatDataBenchmark</a></li>
+
+ <li class="second-level"><a
href="#vectorizedreaddictionaryencodedflatparquetdatabenchmark">VectorizedReadDictionaryEncodedFlatParquetDataBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourcenestedlistorcdatawritebenchmark">IcebergSourceNestedListORCDataWriteBenchmark</a></li>
+
+ <li class="second-level"><a
href="#vectorizedreadflatparquetdatabenchmark">VectorizedReadFlatParquetDataBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourceflatparquetdatawritebenchmark">IcebergSourceFlatParquetDataWriteBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourcenestedavrodatareadbenchmark">IcebergSourceNestedAvroDataReadBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourceflatavrodatareadbenchmark">IcebergSourceFlatAvroDataReadBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourcenestedparquetdatawritebenchmark">IcebergSourceNestedParquetDataWriteBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourcenestedparquetdatareadbenchmark">IcebergSourceNestedParquetDataReadBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourcenestedorcdatareadbenchmark">IcebergSourceNestedORCDataReadBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourceflatparquetdatareadbenchmark">IcebergSourceFlatParquetDataReadBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourceflatparquetdatafilterbenchmark">IcebergSourceFlatParquetDataFilterBenchmark</a></li>
+
+ <li class="second-level"><a
href="#icebergsourcenestedparquetdatafilterbenchmark">IcebergSourceNestedParquetDataFilterBenchmark</a></li>
+
+ <li class="second-level"><a
href="#sparkparquetwritersnesteddatabenchmark">SparkParquetWritersNestedDataBenchmark</a></li>
+
+ </ul>
+</div></div>
+ <div class="col-md-9" role="main">
+
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one
+ - or more contributor license agreements. See the NOTICE file
+ - distributed with this work for additional information
+ - regarding copyright ownership. The ASF licenses this file
+ - to you under the Apache License, Version 2.0 (the
+ - "License"); you may not use this file except in compliance
+ - with the License. You may obtain a copy of the License at
+ -
+ - http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing,
+ - software distributed under the License is distributed on an
+ - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ - KIND, either express or implied. See the License for the
+ - specific language governing permissions and limitations
+ - under the License.
+ -->
+
+<h2 id="available-benchmarks-and-how-to-run-them">Available Benchmarks and how
to run them<a class="headerlink"
href="#available-benchmarks-and-how-to-run-them" title="Permanent
link">¶</a></h2>
+<p>Benchmarks are located under <code><project-name>/jmh</code>. It is
generally favorable to only run the tests of interest rather than running all
available benchmarks.
+Also note that JMH benchmarks run within the same JVM as the
system-under-test, so results might vary between runs.</p>
+<h3
id="icebergsourcenestedlistparquetdatawritebenchmark">IcebergSourceNestedListParquetDataWriteBenchmark<a
class="headerlink" href="#icebergsourcenestedlistparquetdatawritebenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of writing nested Parquet data
using Iceberg and the built-in file source in Spark. To run this benchmark for
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark
-PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt</code></p>
+<h3
id="sparkparquetreadersnesteddatabenchmark">SparkParquetReadersNestedDataBenchmark<a
class="headerlink" href="#sparkparquetreadersnesteddatabenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of reading nested Parquet data
using Iceberg and Spark Parquet readers. To run this benchmark for either
spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark
-PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt</code></p>
+<h3
id="sparkparquetwritersflatdatabenchmark">SparkParquetWritersFlatDataBenchmark<a
class="headerlink" href="#sparkparquetwritersflatdatabenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of writing Parquet data with a
flat schema using Iceberg and Spark Parquet writers. To run this benchmark for
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark
-PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt</code></p>
+<h3
id="icebergsourceflatorcdatareadbenchmark">IcebergSourceFlatORCDataReadBenchmark<a
class="headerlink" href="#icebergsourceflatorcdatareadbenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of reading ORC data with a flat
schema using Iceberg and the built-in file source in Spark. To run this
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark
-PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt</code></p>
+<h3
id="sparkparquetreadersflatdatabenchmark">SparkParquetReadersFlatDataBenchmark<a
class="headerlink" href="#sparkparquetreadersflatdatabenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of reading Parquet data with a
flat schema using Iceberg and Spark Parquet readers. To run this benchmark for
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark
-PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt</code></p>
+<h3
id="vectorizedreaddictionaryencodedflatparquetdatabenchmark">VectorizedReadDictionaryEncodedFlatParquetDataBenchmark<a
class="headerlink"
href="#vectorizedreaddictionaryencodedflatparquetdatabenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark to compare performance of reading Parquet dictionary encoded
data with a flat schema using vectorized Iceberg read path and the built-in
file source in Spark. To run this benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark
-PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt</code></p>
+<h3
id="icebergsourcenestedlistorcdatawritebenchmark">IcebergSourceNestedListORCDataWriteBenchmark<a
class="headerlink" href="#icebergsourcenestedlistorcdatawritebenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of writing nested Parquet data
using Iceberg and the built-in file source in Spark. To run this benchmark for
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark
-PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt</code></p>
+<h3
id="vectorizedreadflatparquetdatabenchmark">VectorizedReadFlatParquetDataBenchmark<a
class="headerlink" href="#vectorizedreadflatparquetdatabenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark to compare performance of reading Parquet data with a flat
schema using vectorized Iceberg read path and the built-in file source in
Spark. To run this benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark
-PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt</code></p>
+<h3
id="icebergsourceflatparquetdatawritebenchmark">IcebergSourceFlatParquetDataWriteBenchmark<a
class="headerlink" href="#icebergsourceflatparquetdatawritebenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of writing Parquet data with a
flat schema using Iceberg and the built-in file source in Spark. To run this
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt</code></p>
+<h3
id="icebergsourcenestedavrodatareadbenchmark">IcebergSourceNestedAvroDataReadBenchmark<a
class="headerlink" href="#icebergsourcenestedavrodatareadbenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of reading Avro data with a flat
schema using Iceberg and the built-in file source in Spark. To run this
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark
-PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt</code></p>
+<h3
id="icebergsourceflatavrodatareadbenchmark">IcebergSourceFlatAvroDataReadBenchmark<a
class="headerlink" href="#icebergsourceflatavrodatareadbenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of reading Avro data with a flat
schema using Iceberg and the built-in file source in Spark. To run this
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark
-PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt</code></p>
+<h3
id="icebergsourcenestedparquetdatawritebenchmark">IcebergSourceNestedParquetDataWriteBenchmark<a
class="headerlink" href="#icebergsourcenestedparquetdatawritebenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of writing nested Parquet data
using Iceberg and the built-in file source in Spark. To run this benchmark for
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt</code></p>
+<h3
id="icebergsourcenestedparquetdatareadbenchmark">IcebergSourceNestedParquetDataReadBenchmark<a
class="headerlink" href="#icebergsourcenestedparquetdatareadbenchmark"
title="Permanent link">¶</a></h3>
+<ul>
+<li>A benchmark that evaluates the performance of reading nested Parquet data
using Iceberg and the built-in file source in Spark. To run this benchmark for
either spark-2 or spark-3:</li>
+</ul>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt</code></p>
+<h3
id="icebergsourcenestedorcdatareadbenchmark">IcebergSourceNestedORCDataReadBenchmark<a
class="headerlink" href="#icebergsourcenestedorcdatareadbenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of reading ORC data with a flat
schema using Iceberg and the built-in file source in Spark. To run this
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark
-PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt</code></p>
+<h3
id="icebergsourceflatparquetdatareadbenchmark">IcebergSourceFlatParquetDataReadBenchmark<a
class="headerlink" href="#icebergsourceflatparquetdatareadbenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the performance of reading Parquet data with a
flat schema using Iceberg and the built-in file source in Spark. To run this
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt</code></p>
+<h3
id="icebergsourceflatparquetdatafilterbenchmark">IcebergSourceFlatParquetDataFilterBenchmark<a
class="headerlink" href="#icebergsourceflatparquetdatafilterbenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the file skipping capabilities in the Spark data
source for Iceberg. This class uses a dataset with a flat schema, where the
records are clustered according to the
+column used in the filter predicate. The performance is compared to the
built-in file source in Spark. To run this benchmark for either spark-2 or
spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt</code></p>
+<h3
id="icebergsourcenestedparquetdatafilterbenchmark">IcebergSourceNestedParquetDataFilterBenchmark<a
class="headerlink" href="#icebergsourcenestedparquetdatafilterbenchmark"
title="Permanent link">¶</a></h3>
+<p>A benchmark that evaluates the file skipping capabilities in the Spark data
source for Iceberg. This class uses a dataset with nested data, where the
records are clustered according to the
+column used in the filter predicate. The performance is compared to the
built-in file source in Spark. To run this benchmark for either spark-2 or
spark-3:
+<code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt</code></p>
+<h3
id="sparkparquetwritersnesteddatabenchmark">SparkParquetWritersNestedDataBenchmark<a
class="headerlink" href="#sparkparquetwritersnesteddatabenchmark"
title="Permanent link">¶</a></h3>
+<ul>
+<li>A benchmark that evaluates the performance of writing nested Parquet data
using Iceberg and Spark Parquet writers. To run this benchmark for either
spark-2 or spark-3:
+ <code>./gradlew :iceberg-spark[2|3]:jmh
-PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark
-PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt</code></li>
+</ul></div>
+
+
+ </div>
+
+
+ <footer class="col-md-12 text-center">
+
+
+ <hr>
+ <p>
+ <small>Copyright 2018-2021 <a href='https://www.apache.org/'>The
Apache Software Foundation</a><br />Apache Iceberg, Iceberg, Apache, the Apache
feather logo, and the Apache Iceberg project logo are either registered<br
/>trademarks or trademarks of The Apache Software Foundation in the United
States and other countries.</small><br>
+
+ <small>Documentation built with <a
href="http://www.mkdocs.org/">MkDocs</a>.</small>
+ </p>
+
+
+
+
+ </footer>
+
+ <script
src="//ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
+ <script src="../js/bootstrap-3.0.3.min.js"></script>
+
+
+ <script
src="//cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/highlight.min.js"></script>
+
+ <script>hljs.initHighlightingOnLoad();</script>
+
+
+ <script>var base_url = ".."</script>
+
+ <script src="../js/base.js"></script>
+
+ <div class="modal" id="mkdocs_keyboard_modal" tabindex="-1" role="dialog"
aria-labelledby="keyboardModalLabel" aria-hidden="true">
+ <div class="modal-dialog">
+ <div class="modal-content">
+ <div class="modal-header">
+ <h4 class="modal-title" id="keyboardModalLabel">Keyboard
Shortcuts</h4>
+ <button type="button" class="close" data-dismiss="modal"><span
aria-hidden="true">×</span><span class="sr-only">Close</span></button>
+ </div>
+ <div class="modal-body">
+ <table class="table">
+ <thead>
+ <tr>
+ <th style="width: 20%;">Keys</th>
+ <th>Action</th>
+ </tr>
+ </thead>
+ <tbody>
+ <tr>
+ <td class="help shortcut"><kbd>?</kbd></td>
+ <td>Open this help</td>
+ </tr>
+ <tr>
+ <td class="next shortcut"><kbd>n</kbd></td>
+ <td>Next page</td>
+ </tr>
+ <tr>
+ <td class="prev shortcut"><kbd>p</kbd></td>
+ <td>Previous page</td>
+ </tr>
+ <tr>
+ <td class="search shortcut"><kbd>s</kbd></td>
+ <td>Search</td>
+ </tr>
+ </tbody>
+ </table>
+ </div>
+ <div class="modal-footer">
+ </div>
+ </div>
+ </div>
+</div>
+ </body>
+
+</html>
diff --git a/community/index.html b/community/index.html
index 2fa3e88..d69138e 100644
--- a/community/index.html
+++ b/community/index.html
@@ -410,6 +410,11 @@
<li class="second-level"><a href="#mailing-lists">Mailing
Lists</a></li>
+ <li class="second-level"><a
href="#setting-up-ide-and-code-style">Setting up IDE and Code Style</a></li>
+
+ <li class="third-level"><a
href="#configuring-code-formatter-for-intellij-idea">Configuring Code Formatter
for IntelliJ IDEA</a></li>
+ <li class="second-level"><a href="#running-benchmarks">Running
Benchmarks</a></li>
+
</ul>
</div></div>
<div class="col-md-9" role="main">
@@ -480,7 +485,17 @@ let us know by sending an email to <a
href="mailto&
<li><a
href="https://lists.apache.org/[email protected]">Archive</a></li>
</ul>
</li>
-</ul></div>
+</ul>
+<h2 id="setting-up-ide-and-code-style">Setting up IDE and Code Style<a
class="headerlink" href="#setting-up-ide-and-code-style" title="Permanent
link">¶</a></h2>
+<h3 id="configuring-code-formatter-for-intellij-idea">Configuring Code
Formatter for IntelliJ IDEA<a class="headerlink"
href="#configuring-code-formatter-for-intellij-idea" title="Permanent
link">¶</a></h3>
+<p>In the <strong>Settings/Preferences</strong> dialog go to <strong>Editor
> Code Style > Java</strong>. Click on the gear wheel and select
<strong>Import Scheme</strong> to import IntelliJ IDEA XML code style settings.
+Point to <a
href="../../.baseline/idea/intellij-java-palantir-style.xml">intellij-java-palantir-style.xml</a>
and hit <strong>OK</strong> (you might need to enable <strong>Show Hidden
Files and Directories</strong> in the dialog). The code itself can then be
formatted via <strong>Code > Reformat Code</strong>.</p>
+<p>See also the IntelliJ <a
href="https://www.jetbrains.com/help/idea/copying-code-style-settings.html">Code
Style docs</a> and <a
href="https://www.jetbrains.com/help/idea/reformat-and-rearrange-code.html">Reformat
Code docs</a> for additional details.</p>
+<h2 id="running-benchmarks">Running Benchmarks<a class="headerlink"
href="#running-benchmarks" title="Permanent link">¶</a></h2>
+<p>Some PRs/changesets might require running benchmarks to determine whether
they are affecting the baseline performance. Currently there is
+no “push a single button to get a performance comparison” solution
available, therefore one has to run JMH performance tests on their local
machine and
+post the results on the PR.</p>
+<p>See <a href="../benchmarks/">Benchmarks</a> for a summary of available
benchmarks and how to run them.</p></div>
</div>
diff --git a/evolution/index.html b/evolution/index.html
index d1a58a4..40ac6fe 100644
--- a/evolution/index.html
+++ b/evolution/index.html
@@ -472,7 +472,7 @@ sampleTable.updateSpec()
.removeField("category")
.commit();
</code></pre>
-<p>Spark supports updating partition spec through its <code>ALTER TABLE</code>
SQL statement, see more details in <a
href="../spark/#alter-table-add-partition-field">Spark SQL</a>.</p>
+<p>Spark supports updating partition spec through its <code>ALTER TABLE</code>
SQL statement, see more details in <a
href="../spark-ddl/#alter-table-add-partition-field">Spark SQL</a>.</p>
<h2 id="sort-order-evolution">Sort order evolution<a class="headerlink"
href="#sort-order-evolution" title="Permanent link">¶</a></h2>
<p>Similar to partition spec, Iceberg sort order can also be updated in an
existing table.
When you evolve a sort order, the old data written with an earlier order
remains unchanged.
@@ -487,7 +487,7 @@ sampleTable.replaceSortOrder()
.dec("category", NullOrder.NULL_FIRST)
.commit();
</code></pre>
-<p>Spark supports updating sort order through its <code>ALTER TABLE</code> SQL
statement, see more details in <a
href="../spark/#alter-table-write-ordered-by">Spark SQL</a>.</p></div>
+<p>Spark supports updating sort order through its <code>ALTER TABLE</code> SQL
statement, see more details in <a
href="../spark-ddl/#alter-table-write-ordered-by">Spark SQL</a>.</p></div>
</div>
diff --git a/flink/index.html b/flink/index.html
index fac1598..a1a4a34 100644
--- a/flink/index.html
+++ b/flink/index.html
@@ -808,7 +808,7 @@ For an unpartitioned iceberg table, its data will be
completely overwritten by <
<h3 id="batch-read">Batch Read<a class="headerlink" href="#batch-read"
title="Permanent link">¶</a></h3>
<p>This example will read all records from iceberg table and then print to the
stdout console in flink batch job:</p>
<pre><code class="language-java">StreamExecutionEnvironment env =
StreamExecutionEnvironment.createLocalEnvironment();
-TableLoader tableLoader =
TableLoader.fromHadooptable("hdfs://nn:8020/warehouse/path");
+TableLoader tableLoader =
TableLoader.fromHadoopTable("hdfs://nn:8020/warehouse/path");
DataStream<RowData> batch = FlinkSource.forRowData()
.env(env)
.tableLoader(tableLoader)
diff --git a/sitemap.xml b/sitemap.xml
index e5bfae3..8706633 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,177 +2,182 @@
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://iceberg.apache.org/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/api/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/aws/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
+ <changefreq>daily</changefreq>
+ </url>
+ <url>
+ <loc>https://iceberg.apache.org/benchmarks/</loc>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/blogs/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/community/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/configuration/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/custom-catalog/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/evolution/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/flink/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/getting-started/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/hive/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/how-to-release/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/java-api-quickstart/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/maintenance/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/nessie/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/partitioning/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/performance/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/python-api-intro/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/python-feature-support/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/python-quickstart/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/releases/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/reliability/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/schemas/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/snapshots/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/spark-configuration/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/spark-ddl/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/spark-procedures/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/spark-queries/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/spark-structured-streaming/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/spark-writes/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/spec/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/terms/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/trademarks/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/trino/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://iceberg.apache.org/why-iceberg/</loc>
- <lastmod>2021-06-29</lastmod>
+ <lastmod>2021-07-12</lastmod>
<changefreq>daily</changefreq>
</url>
</urlset>
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index f94fae8..d8a4130 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ
diff --git a/spark-structured-streaming/index.html
b/spark-structured-streaming/index.html
index 3aedc93..5105c21 100644
--- a/spark-structured-streaming/index.html
+++ b/spark-structured-streaming/index.html
@@ -476,12 +476,12 @@ data.writeStream
<li><code>append</code>: appends the rows of every micro-batch to the
table</li>
<li><code>complete</code>: replaces the table contents every micro-batch</li>
</ul>
-<p>The table should be created in prior to start the streaming query. Refer <a
href="/spark/#create-table">SQL create table</a>
+<p>The table should be created in prior to start the streaming query. Refer <a
href="/spark-ddl/#create-table">SQL create table</a>
on Spark page to see how to create the Iceberg table.</p>
<h3 id="writing-against-partitioned-table">Writing against partitioned table<a
class="headerlink" href="#writing-against-partitioned-table" title="Permanent
link">¶</a></h3>
<p>Iceberg requires the data to be sorted according to the partition spec per
task (Spark partition) in prior to write
against partitioned table. For batch queries you’re encouraged to do
explicit sort to fulfill the requirement
-(see <a href="/spark/#writing-against-partitioned-table">here</a>), but the
approach would bring additional latency as
+(see <a href="/spark-writes/#writing-to-partitioned-tables">here</a>), but the
approach would bring additional latency as
repartition and sort are considered as heavy operations for streaming
workload. To avoid additional latency, you can
enable fanout writer to eliminate the requirement.</p>
<pre><code class="language-scala">val tableIdentifier: String = ...
diff --git a/spec/index.html b/spec/index.html
index 5a29796..23f1b21 100644
--- a/spec/index.html
+++ b/spec/index.html
@@ -417,6 +417,7 @@
<li class="second-level"><a
href="#specification">Specification</a></li>
<li class="third-level"><a href="#terms">Terms</a></li>
+ <li class="third-level"><a href="#writer-requirements">Writer
requirements</a></li>
<li class="third-level"><a
href="#schemas-and-data-types">Schemas and Data Types</a></li>
<li class="third-level"><a
href="#partitioning">Partitioning</a></li>
<li class="third-level"><a href="#sorting">Sorting</a></li>
@@ -524,6 +525,83 @@
<li><strong>Data file</strong> – A file that contains rows of a
table.</li>
<li><strong>Delete file</strong> – A file that encodes rows of a table
that are deleted by position or data values.</li>
</ul>
+<h4 id="writer-requirements">Writer requirements<a class="headerlink"
href="#writer-requirements" title="Permanent link">¶</a></h4>
+<p>Some tables in this spec have columns that specify requirements for v1 and
v2 tables. These requirements are intended for writers when adding metadata
files to a table with the given version.</p>
+<table>
+<thead>
+<tr>
+<th>Requirement</th>
+<th>Write behavior</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>(blank)</td>
+<td>The field should be omitted</td>
+</tr>
+<tr>
+<td><em>optional</em></td>
+<td>The field can be written</td>
+</tr>
+<tr>
+<td><em>required</em></td>
+<td>The field must be written</td>
+</tr>
+</tbody>
+</table>
+<p>Readers should be more permissive because v1 metadata files are allowed in
v2 tables so that tables can be upgraded to v2 without rewriting the metadata
tree. For manifest list and manifest files, this table shows the expected v2
read behavior:</p>
+<table>
+<thead>
+<tr>
+<th>v1</th>
+<th>v2</th>
+<th>v2 read behavior</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td></td>
+<td><em>optional</em></td>
+<td>Read the field as <em>optional</em></td>
+</tr>
+<tr>
+<td></td>
+<td><em>required</em></td>
+<td>Read the field as <em>optional</em>; it may be missing in v1 files</td>
+</tr>
+<tr>
+<td><em>optional</em></td>
+<td></td>
+<td>Ignore the field</td>
+</tr>
+<tr>
+<td><em>optional</em></td>
+<td><em>optional</em></td>
+<td>Read the field as <em>optional</em></td>
+</tr>
+<tr>
+<td><em>optional</em></td>
+<td><em>required</em></td>
+<td>Read the field as <em>optional</em>; it may be missing in v1 files</td>
+</tr>
+<tr>
+<td><em>required</em></td>
+<td></td>
+<td>Ignore the field</td>
+</tr>
+<tr>
+<td><em>required</em></td>
+<td><em>optional</em></td>
+<td>Read the field as <em>optional</em></td>
+</tr>
+<tr>
+<td><em>required</em></td>
+<td><em>required</em></td>
+<td>Fill in a default or throw an exception if the field is missing</td>
+</tr>
+</tbody>
+</table>
+<p>Readers may be more strict for metadata JSON files because the JSON files
are not reused and will always match the table version. Required v2 fields that
were not present in v1 or optional in v1 may be handled as required fields. For
example, a v2 table that is missing <code>last-sequence-number</code> can throw
an exception.</p>
<h3 id="schemas-and-data-types">Schemas and Data Types<a class="headerlink"
href="#schemas-and-data-types" title="Permanent link">¶</a></h3>
<p>A table’s <strong>schema</strong> is a list of named columns. All
data types are either primitives or nested types, which are maps, lists, or
structs. A table schema is also a struct type.</p>
<p>For the representations of these types in Avro, ORC, and Parquet file
formats, see Appendix A.</p>
@@ -2619,27 +2697,83 @@ Hash results are not dependent on decimal scale, which
is part of the type, not
<h3 id="version-2">Version 2<a class="headerlink" href="#version-2"
title="Permanent link">¶</a></h3>
<p>Writing v1 metadata:</p>
<ul>
-<li>Table metadata field <code>last-sequence-number</code> should not be
written.</li>
-<li>Snapshot field <code>sequence-number</code> should not be written.</li>
+<li>Table metadata field <code>last-sequence-number</code> should not be
written</li>
+<li>Snapshot field <code>sequence-number</code> should not be written</li>
+<li>Manifest list field <code>sequence-number</code> should not be written</li>
+<li>Manifest list field <code>min-sequence-number</code> should not be
written</li>
+<li>Manifest list field <code>content</code> must be 0 (data) or omitted</li>
+<li>Manifest entry field <code>sequence_number</code> should not be
written</li>
+<li>Data file field <code>content</code> must be 0 (data) or omitted</li>
</ul>
-<p>Reading v1 metadata:</p>
+<p>Reading v1 metadata for v2:</p>
<ul>
-<li>Table metadata field <code>last-sequence-number</code> must default to
0.</li>
-<li>Snapshot field <code>sequence-number</code> must default to 0.</li>
+<li>Table metadata field <code>last-sequence-number</code> must default to
0</li>
+<li>Snapshot field <code>sequence-number</code> must default to 0</li>
+<li>Manifest list field <code>sequence-number</code> must default to 0</li>
+<li>Manifest list field <code>min-sequence-number</code> must default to 0</li>
+<li>Manifest list field <code>content</code> must default to 0 (data)</li>
+<li>Manifest entry field <code>sequence_number</code> must default to 0</li>
+<li>Data file field <code>content</code> must default to 0 (data)</li>
</ul>
<p>Writing v2 metadata:</p>
<ul>
-<li>Table metadata added required field <code>last-sequence-number</code>.</li>
-<li>Table metadata now requires field <code>table-uuid</code>.</li>
-<li>Table metadata now requires field <code>partition-specs</code>.</li>
-<li>Table metadata now requires field <code>default-spec-id</code>.</li>
-<li>Table metadata now requires field <code>last-partition-id</code>.</li>
-<li>Table metadata field <code>partition-spec</code> is no longer required and
may be omitted.</li>
-<li>Snapshot added required field <code>sequence-number</code>.</li>
-<li>Snapshot now requires field <code>manifest-list</code>.</li>
-<li>Snapshot field <code>manifests</code> is no longer allowed.</li>
-<li>Table metadata now requires field <code>sort-orders</code>.</li>
-<li>Table metadata now requires field <code>default-sort-order-id</code>.</li>
+<li>Table metadata JSON:<ul>
+<li><code>last-sequence-number</code> was added and is required; default to 0
when reading v1 metadata</li>
+<li><code>table-uuid</code> is now required</li>
+<li><code>current-schema-id</code> is now required</li>
+<li><code>schemas</code> is now required</li>
+<li><code>partition-specs</code> is now required</li>
+<li><code>default-spec-id</code> is now required</li>
+<li><code>last-partition-id</code> is now required</li>
+<li><code>sort-orders</code> is now required</li>
+<li><code>default-sort-order-id</code> is now required</li>
+<li><code>schema</code> is no longer required and should be omitted; use
<code>schemas</code> and <code>current-schema-id</code> instead</li>
+<li><code>partition-spec</code> is no longer required and should be omitted;
use <code>partition-specs</code> and <code>default-spec-id</code> instead</li>
+</ul>
+</li>
+<li>Snapshot JSON:<ul>
+<li><code>sequence-number</code> was added and is required; default to 0 when
reading v1 metadata</li>
+<li><code>manifest-list</code> is now required</li>
+<li><code>manifests</code> is no longer required and should be omitted; always
use <code>manifest-list</code> instead</li>
+</ul>
+</li>
+<li>Manifest list <code>manifest_file</code>:<ul>
+<li><code>content</code> was added and is required; 0=data, 1=deletes; default
to 0 when reading v1 manifest lists</li>
+<li><code>sequence_number</code> was added and is required</li>
+<li><code>min_sequence_number</code> was added and is required</li>
+<li><code>added_files_count</code> is now required</li>
+<li><code>existing_files_count</code> is now required</li>
+<li><code>deleted_files_count</code> is now required</li>
+<li><code>added_rows_count</code> is now required</li>
+<li><code>existing_rows_count</code> is now required</li>
+<li><code>deleted_rows_count</code> is now required</li>
+</ul>
+</li>
+<li>Manifest list <code>field_summary</code>:<ul>
+<li><code>contains_nan</code> is now required</li>
+</ul>
+</li>
+<li>Manifest key-value metadata:<ul>
+<li><code>schema-id</code> is now required</li>
+<li><code>partition-spec-id</code> is now required</li>
+<li><code>format-version</code> is now required</li>
+<li><code>content</code> was added and is required (must be “data”
or “deletes”)</li>
+</ul>
+</li>
+<li>Manifest <code>manifest_entry</code>:<ul>
+<li><code>snapshot_id</code> is now optional to support inheritance</li>
+<li><code>sequence_number</code> was added and is optional, to support
inheritance</li>
+</ul>
+</li>
+<li>Manifest <code>data_file</code>:<ul>
+<li><code>content</code> was added and is required; 0=data, 1=position
deletes, 2=equality deletes; default to 0 when reading v1 manifests</li>
+<li><code>equality_ids</code> was added, to be used for equality deletes
only</li>
+<li><code>block_size_in_bytes</code> was removed (breaks v1 reader
compatibility)</li>
+<li><code>file_ordinal</code> was removed</li>
+<li><code>sort_columns</code> was removed</li>
+<li><code>distinct_counts</code> was removed</li>
+</ul>
+</li>
</ul>
<p>Note that these requirements apply when writing data to a v2 table. Tables
that are upgraded from v1 may contain metadata that does not follow these
requirements. Implementations should remain backward-compatible with v1
metadata requirements.</p></div>