[iceberg] branch asf-site updated: Deployed 25eaebacb with MkDocs version: 1.2.1

blue Sun, 11 Jul 2021 17:04:09 -0700

This is an automated email from the ASF dual-hosted git repository.

blue pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/iceberg.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new a885685  Deployed 25eaebacb with MkDocs version: 1.2.1
a885685 is described below

commit a8856855d4cd5cdf430bab323fe4c84fba89d9b4
Author: Ryan Blue <[email protected]>
AuthorDate: Sun Jul 11 17:03:48 2021 -0700

    Deployed 25eaebacb with MkDocs version: 1.2.1
---
 aws/index.html                        |   2 +-
 benchmarks/index.html                 | 592 ++++++++++++++++++++++++++++++++++
 community/index.html                  |  17 +-
 evolution/index.html                  |   4 +-
 flink/index.html                      |   2 +-
 sitemap.xml                           |  75 +++--
 sitemap.xml.gz                        | Bin 467 -> 473 bytes
 spark-structured-streaming/index.html |   4 +-
 spec/index.html                       | 166 +++++++++-
 9 files changed, 804 insertions(+), 58 deletions(-)

diff --git a/aws/index.html b/aws/index.html
index af53ddc..d4ed104 100644
--- a/aws/index.html
+++ b/aws/index.html
@@ -568,7 +568,7 @@ an Iceberg table is stored as a <a 
href="https://docs.aws.amazon.com/glue/latest
 and every Iceberg table version is stored as a <a 
href="https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html#aws-glue-api-catalog-tables-TableVersion";>Glue
 TableVersion</a>. 
 You can start using Glue catalog by specifying the <code>catalog-impl</code> 
as <code>org.apache.iceberg.aws.glue.GlueCatalog</code>,
 just like what is shown in the <a href="#enabling-aws-integration">enabling 
AWS integration</a> section above. 
-More details about loading the catalog can be found in individual engine 
pages, such as <a href="../spark/#loading-a-custom-catalog">Spark</a> and <a 
href="../flink/#creating-catalogs-and-using-catalogs">Flink</a>.</p>
+More details about loading the catalog can be found in individual engine 
pages, such as <a 
href="../spark-configuration/#loading-a-custom-catalog">Spark</a> and <a 
href="../flink/#creating-catalogs-and-using-catalogs">Flink</a>.</p>
 <h3 id="glue-catalog-id">Glue Catalog ID<a class="headerlink" 
href="#glue-catalog-id" title="Permanent link">&para;</a></h3>
 <p>There is a unique Glue metastore in each AWS account and each AWS region.
 By default, <code>GlueCatalog</code> chooses the Glue metastore to use based 
on the user&rsquo;s default AWS client credential and region setup.
diff --git a/benchmarks/index.html b/benchmarks/index.html
new file mode 100644
index 0000000..9a49cf4
--- /dev/null
+++ b/benchmarks/index.html
@@ -0,0 +1,592 @@
+<!DOCTYPE html>
+<html lang="en">
+
+<head>
+    <meta charset="utf-8">
+    <meta http-equiv="X-UA-Compatible" content="IE=edge">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <meta name="description" content="A table format for large, slow-moving 
tabular data">
+    
+    <link rel="canonical" href="https://iceberg.apache.org/benchmarks/";>
+    <link rel="shortcut icon" href="../img/favicon.ico">
+
+    
+    <title>Benchmarks - Apache Iceberg</title>
+    
+
+    <link rel="stylesheet" 
href="https://use.fontawesome.com/releases/v5.12.0/css/all.css";>
+    <link rel="stylesheet" 
href="https://use.fontawesome.com/releases/v5.12.0/css/v4-shims.css";>
+    <link rel="stylesheet" 
href="//cdn.jsdelivr.net/npm/[email protected]/build/web/hack.min.css">
+    <link href='//rsms.me/inter/inter.css' rel='stylesheet' type='text/css'>
+    <link 
href='//fonts.googleapis.com/css?family=Open+Sans:300italic,400italic,700italic,400,300,600,700&subset=latin-ext,latin'
 rel='stylesheet' type='text/css'>
+    <link href="../css/bootstrap-custom.min.css" rel="stylesheet">
+    <link href="../css/base.min.css" rel="stylesheet">
+    <link href="../css/cinder.min.css" rel="stylesheet">
+
+    
+        
+        <link rel="stylesheet" 
href="//cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/styles/github.min.css">
+        
+    
+    <link href="../css/extra.css" rel="stylesheet">
+
+    <!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media 
queries -->
+    <!--[if lt IE 9]>
+            <script 
src="https://cdn.jsdelivr.net/npm/[email protected]/dist/html5shiv.min.js";></script>
+            <script 
src="https://cdn.jsdelivr.net/npm/[email protected]/dest/respond.min.js";></script>
+        <![endif]-->
+
+    
+
+     
+</head>
+
+<body>
+
+    <div class="navbar navbar-default navbar-fixed-top" role="navigation">
+    <div class="container">
+
+        <!-- Collapsed navigation -->
+        <div class="navbar-header">
+            <!-- Expander button -->
+            <button type="button" class="navbar-toggle" data-toggle="collapse" 
data-target=".navbar-collapse">
+                <span class="sr-only">Toggle navigation</span>
+                <span class="icon-bar"></span>
+                <span class="icon-bar"></span>
+                <span class="icon-bar"></span>
+            </button>
+            
+
+            <!-- Main title -->
+
+            
+              <a class="navbar-brand" href="..">Apache Iceberg</a>
+            
+        </div>
+
+        <!-- Expanded navigation -->
+        <div class="navbar-collapse collapse">
+                <!-- Main navigation -->
+                <ul class="nav navbar-nav">
+                
+                
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Project <b class="caret"></b></a>
+                        <ul class="dropdown-menu">
+                        
+                            
+<li >
+    <a href="..">About</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../community/">Community</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../releases/">Releases</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../blogs/">Blogs</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../trademarks/">Trademarks</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../how-to-release/">How to Release</a>
+</li>
+
+                        
+                        </ul>
+                    </li>
+                
+                
+                
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Tables <b class="caret"></b></a>
+                        <ul class="dropdown-menu">
+                        
+                            
+<li >
+    <a href="../configuration/">Configuration</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../schemas/">Schemas</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../partitioning/">Partitioning</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../evolution/">Table evolution</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../maintenance/">Maintenance</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../performance/">Performance</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../reliability/">Reliability</a>
+</li>
+
+                        
+                        </ul>
+                    </li>
+                
+                
+                
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Spark <b class="caret"></b></a>
+                        <ul class="dropdown-menu">
+                        
+                            
+<li >
+    <a href="../getting-started/">Getting Started</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../spark-configuration/">Configuration</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../spark-ddl/">DDL</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../spark-queries/">Queries</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../spark-writes/">Writes</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../spark-procedures/">Maintenance Procedures</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../spark-structured-streaming/">Structured Streaming</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../spark-queries/#time-travel">Time Travel</a>
+</li>
+
+                        
+                        </ul>
+                    </li>
+                
+                
+                
+                    <li >
+                        <a 
href="https://trino.io/docs/current/connector/iceberg.html";>Trino</a>
+                    </li>
+                
+                
+                
+                    <li >
+                        <a href="../flink/">Flink</a>
+                    </li>
+                
+                
+                
+                    <li >
+                        <a href="../hive/">Hive</a>
+                    </li>
+                
+                
+                
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Integrations <b class="caret"></b></a>
+                        <ul class="dropdown-menu">
+                        
+                            
+<li >
+    <a href="../aws/">AWS</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../nessie/">Nessie</a>
+</li>
+
+                        
+                        </ul>
+                    </li>
+                
+                
+                
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">API <b class="caret"></b></a>
+                        <ul class="dropdown-menu">
+                        
+                            
+<li >
+    <a href="/javadoc/">Javadoc</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../api/">Java API intro</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../java-api-quickstart/">Java Quickstart</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../custom-catalog/">Java Custom Catalog</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../python-quickstart/">Python Quickstart</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../python-api-intro/">Python API Intro</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../python-feature-support/">Python Feature Support</a>
+</li>
+
+                        
+                        </ul>
+                    </li>
+                
+                
+                
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">Format <b class="caret"></b></a>
+                        <ul class="dropdown-menu">
+                        
+                            
+<li >
+    <a href="../terms/">Definitions</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="../spec/">Spec</a>
+</li>
+
+                        
+                        </ul>
+                    </li>
+                
+                
+                
+                    <li >
+                        <a href="https://github.com/apache/iceberg";>GitHub</a>
+                    </li>
+                
+                
+                
+                    <li class="dropdown">
+                        <a href="#" class="dropdown-toggle" 
data-toggle="dropdown">ASF <b class="caret"></b></a>
+                        <ul class="dropdown-menu">
+                        
+                            
+<li >
+    <a href="https://www.apache.org/licenses/";>License</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="https://www.apache.org/security/";>Security</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="https://www.apache.org/foundation/thanks.html";>Sponsors</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="https://www.apache.org/foundation/sponsorship.html";>Donate</a>
+</li>
+
+                        
+                            
+<li >
+    <a href="https://www.apache.org/events/current-event.html";>Events</a>
+</li>
+
+                        
+                        </ul>
+                    </li>
+                
+                
+                </ul>
+
+            <ul class="nav navbar-nav navbar-right">
+            </ul>
+        </div>
+    </div>
+</div>
+
+    <div class="container">
+        
+        
+        <div class="col-md-3"><div class="bs-sidebar hidden-print affix well" 
role="complementary">
+    <ul class="nav bs-sidenav">
+        <li class="first-level active"><a 
href="#available-benchmarks-and-how-to-run-them">Available Benchmarks and how 
to run them</a></li>
+            <li class="second-level"><a 
href="#icebergsourcenestedlistparquetdatawritebenchmark">IcebergSourceNestedListParquetDataWriteBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#sparkparquetreadersnesteddatabenchmark">SparkParquetReadersNestedDataBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#sparkparquetwritersflatdatabenchmark">SparkParquetWritersFlatDataBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourceflatorcdatareadbenchmark">IcebergSourceFlatORCDataReadBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#sparkparquetreadersflatdatabenchmark">SparkParquetReadersFlatDataBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#vectorizedreaddictionaryencodedflatparquetdatabenchmark">VectorizedReadDictionaryEncodedFlatParquetDataBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourcenestedlistorcdatawritebenchmark">IcebergSourceNestedListORCDataWriteBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#vectorizedreadflatparquetdatabenchmark">VectorizedReadFlatParquetDataBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourceflatparquetdatawritebenchmark">IcebergSourceFlatParquetDataWriteBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourcenestedavrodatareadbenchmark">IcebergSourceNestedAvroDataReadBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourceflatavrodatareadbenchmark">IcebergSourceFlatAvroDataReadBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourcenestedparquetdatawritebenchmark">IcebergSourceNestedParquetDataWriteBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourcenestedparquetdatareadbenchmark">IcebergSourceNestedParquetDataReadBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourcenestedorcdatareadbenchmark">IcebergSourceNestedORCDataReadBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourceflatparquetdatareadbenchmark">IcebergSourceFlatParquetDataReadBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourceflatparquetdatafilterbenchmark">IcebergSourceFlatParquetDataFilterBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#icebergsourcenestedparquetdatafilterbenchmark">IcebergSourceNestedParquetDataFilterBenchmark</a></li>
+                
+            <li class="second-level"><a 
href="#sparkparquetwritersnesteddatabenchmark">SparkParquetWritersNestedDataBenchmark</a></li>
+                
+    </ul>
+</div></div>
+        <div class="col-md-9" role="main">
+
+<!--
+  - Licensed to the Apache Software Foundation (ASF) under one
+  - or more contributor license agreements.  See the NOTICE file
+  - distributed with this work for additional information
+  - regarding copyright ownership.  The ASF licenses this file
+  - to you under the Apache License, Version 2.0 (the
+  - "License"); you may not use this file except in compliance
+  - with the License.  You may obtain a copy of the License at
+  -
+  -   http://www.apache.org/licenses/LICENSE-2.0
+  -
+  - Unless required by applicable law or agreed to in writing,
+  - software distributed under the License is distributed on an
+  - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  - KIND, either express or implied.  See the License for the
+  - specific language governing permissions and limitations
+  - under the License.
+  -->
+
+<h2 id="available-benchmarks-and-how-to-run-them">Available Benchmarks and how 
to run them<a class="headerlink" 
href="#available-benchmarks-and-how-to-run-them" title="Permanent 
link">&para;</a></h2>
+<p>Benchmarks are located under <code>&lt;project-name&gt;/jmh</code>. It is 
generally favorable to only run the tests of interest rather than running all 
available benchmarks.
+Also note that JMH benchmarks run within the same JVM as the 
system-under-test, so results might vary between runs.</p>
+<h3 
id="icebergsourcenestedlistparquetdatawritebenchmark">IcebergSourceNestedListParquetDataWriteBenchmark<a
 class="headerlink" href="#icebergsourcenestedlistparquetdatawritebenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of writing nested Parquet data 
using Iceberg and the built-in file source in Spark. To run this benchmark for 
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedListParquetDataWriteBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-list-parquet-data-write-benchmark-result.txt</code></p>
+<h3 
id="sparkparquetreadersnesteddatabenchmark">SparkParquetReadersNestedDataBenchmark<a
 class="headerlink" href="#sparkparquetreadersnesteddatabenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of reading nested Parquet data 
using Iceberg and Spark Parquet readers. To run this benchmark for either 
spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=SparkParquetReadersNestedDataBenchmark 
-PjmhOutputPath=benchmark/spark-parquet-readers-nested-data-benchmark-result.txt</code></p>
+<h3 
id="sparkparquetwritersflatdatabenchmark">SparkParquetWritersFlatDataBenchmark<a
 class="headerlink" href="#sparkparquetwritersflatdatabenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of writing Parquet data with a 
flat schema using Iceberg and Spark Parquet writers. To run this benchmark for 
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=SparkParquetWritersFlatDataBenchmark 
-PjmhOutputPath=benchmark/spark-parquet-writers-flat-data-benchmark-result.txt</code></p>
+<h3 
id="icebergsourceflatorcdatareadbenchmark">IcebergSourceFlatORCDataReadBenchmark<a
 class="headerlink" href="#icebergsourceflatorcdatareadbenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of reading ORC data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatORCDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-orc-data-read-benchmark-result.txt</code></p>
+<h3 
id="sparkparquetreadersflatdatabenchmark">SparkParquetReadersFlatDataBenchmark<a
 class="headerlink" href="#sparkparquetreadersflatdatabenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of reading Parquet data with a 
flat schema using Iceberg and Spark Parquet readers. To run this benchmark for 
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=SparkParquetReadersFlatDataBenchmark 
-PjmhOutputPath=benchmark/spark-parquet-readers-flat-data-benchmark-result.txt</code></p>
+<h3 
id="vectorizedreaddictionaryencodedflatparquetdatabenchmark">VectorizedReadDictionaryEncodedFlatParquetDataBenchmark<a
 class="headerlink" 
href="#vectorizedreaddictionaryencodedflatparquetdatabenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark to compare performance of reading Parquet dictionary encoded 
data with a flat schema using vectorized Iceberg read path and the built-in 
file source in Spark. To run this benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=VectorizedReadDictionaryEncodedFlatParquetDataBenchmark 
-PjmhOutputPath=benchmark/vectorized-read-dict-encoded-flat-parquet-data-result.txt</code></p>
+<h3 
id="icebergsourcenestedlistorcdatawritebenchmark">IcebergSourceNestedListORCDataWriteBenchmark<a
 class="headerlink" href="#icebergsourcenestedlistorcdatawritebenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of writing nested Parquet data 
using Iceberg and the built-in file source in Spark. To run this benchmark for 
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedListORCDataWriteBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-list-orc-data-write-benchmark-result.txt</code></p>
+<h3 
id="vectorizedreadflatparquetdatabenchmark">VectorizedReadFlatParquetDataBenchmark<a
 class="headerlink" href="#vectorizedreadflatparquetdatabenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark to compare performance of reading Parquet data with a flat 
schema using vectorized Iceberg read path and the built-in file source in 
Spark. To run this benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=VectorizedReadFlatParquetDataBenchmark 
-PjmhOutputPath=benchmark/vectorized-read-flat-parquet-data-result.txt</code></p>
+<h3 
id="icebergsourceflatparquetdatawritebenchmark">IcebergSourceFlatParquetDataWriteBenchmark<a
 class="headerlink" href="#icebergsourceflatparquetdatawritebenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of writing Parquet data with a 
flat schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatParquetDataWriteBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-write-benchmark-result.txt</code></p>
+<h3 
id="icebergsourcenestedavrodatareadbenchmark">IcebergSourceNestedAvroDataReadBenchmark<a
 class="headerlink" href="#icebergsourcenestedavrodatareadbenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of reading Avro data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedAvroDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-avro-data-read-benchmark-result.txt</code></p>
+<h3 
id="icebergsourceflatavrodatareadbenchmark">IcebergSourceFlatAvroDataReadBenchmark<a
 class="headerlink" href="#icebergsourceflatavrodatareadbenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of reading Avro data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatAvroDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-avro-data-read-benchmark-result.txt</code></p>
+<h3 
id="icebergsourcenestedparquetdatawritebenchmark">IcebergSourceNestedParquetDataWriteBenchmark<a
 class="headerlink" href="#icebergsourcenestedparquetdatawritebenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of writing nested Parquet data 
using Iceberg and the built-in file source in Spark. To run this benchmark for 
either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedParquetDataWriteBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-write-benchmark-result.txt</code></p>
+<h3 
id="icebergsourcenestedparquetdatareadbenchmark">IcebergSourceNestedParquetDataReadBenchmark<a
 class="headerlink" href="#icebergsourcenestedparquetdatareadbenchmark" 
title="Permanent link">&para;</a></h3>
+<ul>
+<li>A benchmark that evaluates the performance of reading nested Parquet data 
using Iceberg and the built-in file source in Spark. To run this benchmark for 
either spark-2 or spark-3:</li>
+</ul>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedParquetDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-read-benchmark-result.txt</code></p>
+<h3 
id="icebergsourcenestedorcdatareadbenchmark">IcebergSourceNestedORCDataReadBenchmark<a
 class="headerlink" href="#icebergsourcenestedorcdatareadbenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of reading ORC data with a flat 
schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedORCDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-orc-data-read-benchmark-result.txt</code></p>
+<h3 
id="icebergsourceflatparquetdatareadbenchmark">IcebergSourceFlatParquetDataReadBenchmark<a
 class="headerlink" href="#icebergsourceflatparquetdatareadbenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the performance of reading Parquet data with a 
flat schema using Iceberg and the built-in file source in Spark. To run this 
benchmark for either spark-2 or spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatParquetDataReadBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-read-benchmark-result.txt</code></p>
+<h3 
id="icebergsourceflatparquetdatafilterbenchmark">IcebergSourceFlatParquetDataFilterBenchmark<a
 class="headerlink" href="#icebergsourceflatparquetdatafilterbenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the file skipping capabilities in the Spark data 
source for Iceberg. This class uses a dataset with a flat schema, where the 
records are clustered according to the
+column used in the filter predicate. The performance is compared to the 
built-in file source in Spark. To run this benchmark for either spark-2 or 
spark-3:</p>
+<p><code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceFlatParquetDataFilterBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-flat-parquet-data-filter-benchmark-result.txt</code></p>
+<h3 
id="icebergsourcenestedparquetdatafilterbenchmark">IcebergSourceNestedParquetDataFilterBenchmark<a
 class="headerlink" href="#icebergsourcenestedparquetdatafilterbenchmark" 
title="Permanent link">&para;</a></h3>
+<p>A benchmark that evaluates the file skipping capabilities in the Spark data 
source for Iceberg. This class uses a dataset with nested data, where the 
records are clustered according to the
+column used in the filter predicate. The performance is compared to the 
built-in file source in Spark. To run this benchmark for either spark-2 or 
spark-3:
+<code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=IcebergSourceNestedParquetDataFilterBenchmark 
-PjmhOutputPath=benchmark/iceberg-source-nested-parquet-data-filter-benchmark-result.txt</code></p>
+<h3 
id="sparkparquetwritersnesteddatabenchmark">SparkParquetWritersNestedDataBenchmark<a
 class="headerlink" href="#sparkparquetwritersnesteddatabenchmark" 
title="Permanent link">&para;</a></h3>
+<ul>
+<li>A benchmark that evaluates the performance of writing nested Parquet data 
using Iceberg and Spark Parquet writers. To run this benchmark for either 
spark-2 or spark-3:
+  <code>./gradlew :iceberg-spark[2|3]:jmh 
-PjmhIncludeRegex=SparkParquetWritersNestedDataBenchmark 
-PjmhOutputPath=benchmark/spark-parquet-writers-nested-data-benchmark-result.txt</code></li>
+</ul></div>
+        
+        
+    </div>
+
+    
+      <footer class="col-md-12 text-center">
+          
+          
+            <hr>
+            <p>
+            <small>Copyright 2018-2021 <a href='https://www.apache.org/'>The 
Apache Software Foundation</a><br />Apache Iceberg, Iceberg, Apache, the Apache 
feather logo, and the Apache Iceberg project logo are either registered<br 
/>trademarks or trademarks of The Apache Software Foundation in the United 
States and other countries.</small><br>
+            
+            <small>Documentation built with <a 
href="http://www.mkdocs.org/";>MkDocs</a>.</small>
+            </p>
+          
+
+          
+          
+      </footer>
+    
+    <script 
src="//ajax.googleapis.com/ajax/libs/jquery/1.12.4/jquery.min.js"></script>
+    <script src="../js/bootstrap-3.0.3.min.js"></script>
+
+    
+    <script 
src="//cdn.jsdelivr.net/gh/highlightjs/[email protected]/build/highlight.min.js"></script>
+        
+    <script>hljs.initHighlightingOnLoad();</script>
+    
+
+    <script>var base_url = ".."</script>
+    
+    <script src="../js/base.js"></script>
+
+    <div class="modal" id="mkdocs_keyboard_modal" tabindex="-1" role="dialog" 
aria-labelledby="keyboardModalLabel" aria-hidden="true">
+    <div class="modal-dialog">
+        <div class="modal-content">
+            <div class="modal-header">
+                <h4 class="modal-title" id="keyboardModalLabel">Keyboard 
Shortcuts</h4>
+                <button type="button" class="close" data-dismiss="modal"><span 
aria-hidden="true">&times;</span><span class="sr-only">Close</span></button>
+            </div>
+            <div class="modal-body">
+              <table class="table">
+                <thead>
+                  <tr>
+                    <th style="width: 20%;">Keys</th>
+                    <th>Action</th>
+                  </tr>
+                </thead>
+                <tbody>
+                  <tr>
+                    <td class="help shortcut"><kbd>?</kbd></td>
+                    <td>Open this help</td>
+                  </tr>
+                  <tr>
+                    <td class="next shortcut"><kbd>n</kbd></td>
+                    <td>Next page</td>
+                  </tr>
+                  <tr>
+                    <td class="prev shortcut"><kbd>p</kbd></td>
+                    <td>Previous page</td>
+                  </tr>
+                  <tr>
+                    <td class="search shortcut"><kbd>s</kbd></td>
+                    <td>Search</td>
+                  </tr>
+                </tbody>
+              </table>
+            </div>
+            <div class="modal-footer">
+            </div>
+        </div>
+    </div>
+</div>
+    </body>
+
+</html>
diff --git a/community/index.html b/community/index.html
index 2fa3e88..d69138e 100644
--- a/community/index.html
+++ b/community/index.html
@@ -410,6 +410,11 @@
                 
             <li class="second-level"><a href="#mailing-lists">Mailing 
Lists</a></li>
                 
+            <li class="second-level"><a 
href="#setting-up-ide-and-code-style">Setting up IDE and Code Style</a></li>
+                
+                <li class="third-level"><a 
href="#configuring-code-formatter-for-intellij-idea">Configuring Code Formatter 
for IntelliJ IDEA</a></li>
+            <li class="second-level"><a href="#running-benchmarks">Running 
Benchmarks</a></li>
+                
     </ul>
 </div></div>
         <div class="col-md-9" role="main">
@@ -480,7 +485,17 @@ let us know by sending an email to <a 
href="&#109;&#97;&#105;&#108;&#116;&#111;&
 <li><a 
href="https://lists.apache.org/[email protected]";>Archive</a></li>
 </ul>
 </li>
-</ul></div>
+</ul>
+<h2 id="setting-up-ide-and-code-style">Setting up IDE and Code Style<a 
class="headerlink" href="#setting-up-ide-and-code-style" title="Permanent 
link">&para;</a></h2>
+<h3 id="configuring-code-formatter-for-intellij-idea">Configuring Code 
Formatter for IntelliJ IDEA<a class="headerlink" 
href="#configuring-code-formatter-for-intellij-idea" title="Permanent 
link">&para;</a></h3>
+<p>In the <strong>Settings/Preferences</strong> dialog go to <strong>Editor 
&gt; Code Style &gt; Java</strong>. Click on the gear wheel and select 
<strong>Import Scheme</strong> to import IntelliJ IDEA XML code style settings.
+Point to <a 
href="../../.baseline/idea/intellij-java-palantir-style.xml">intellij-java-palantir-style.xml</a>
 and hit <strong>OK</strong> (you might need to enable <strong>Show Hidden 
Files and Directories</strong> in the dialog). The code itself can then be 
formatted via <strong>Code &gt; Reformat Code</strong>.</p>
+<p>See also the IntelliJ <a 
href="https://www.jetbrains.com/help/idea/copying-code-style-settings.html";>Code
 Style docs</a> and <a 
href="https://www.jetbrains.com/help/idea/reformat-and-rearrange-code.html";>Reformat
 Code docs</a> for additional details.</p>
+<h2 id="running-benchmarks">Running Benchmarks<a class="headerlink" 
href="#running-benchmarks" title="Permanent link">&para;</a></h2>
+<p>Some PRs/changesets might require running benchmarks to determine whether 
they are affecting the baseline performance. Currently there is 
+no &ldquo;push a single button to get a performance comparison&rdquo; solution 
available, therefore one has to run JMH performance tests on their local 
machine and
+post the results on the PR.</p>
+<p>See <a href="../benchmarks/">Benchmarks</a> for a summary of available 
benchmarks and how to run them.</p></div>
         
         
     </div>
diff --git a/evolution/index.html b/evolution/index.html
index d1a58a4..40ac6fe 100644
--- a/evolution/index.html
+++ b/evolution/index.html
@@ -472,7 +472,7 @@ sampleTable.updateSpec()
     .removeField(&quot;category&quot;)
     .commit();
 </code></pre>
-<p>Spark supports updating partition spec through its <code>ALTER TABLE</code> 
SQL statement, see more details in <a 
href="../spark/#alter-table-add-partition-field">Spark SQL</a>.</p>
+<p>Spark supports updating partition spec through its <code>ALTER TABLE</code> 
SQL statement, see more details in <a 
href="../spark-ddl/#alter-table-add-partition-field">Spark SQL</a>.</p>
 <h2 id="sort-order-evolution">Sort order evolution<a class="headerlink" 
href="#sort-order-evolution" title="Permanent link">&para;</a></h2>
 <p>Similar to partition spec, Iceberg sort order can also be updated in an 
existing table.
 When you evolve a sort order, the old data written with an earlier order 
remains unchanged.
@@ -487,7 +487,7 @@ sampleTable.replaceSortOrder()
    .dec(&quot;category&quot;, NullOrder.NULL_FIRST)
    .commit();
 </code></pre>
-<p>Spark supports updating sort order through its <code>ALTER TABLE</code> SQL 
statement, see more details in <a 
href="../spark/#alter-table-write-ordered-by">Spark SQL</a>.</p></div>
+<p>Spark supports updating sort order through its <code>ALTER TABLE</code> SQL 
statement, see more details in <a 
href="../spark-ddl/#alter-table-write-ordered-by">Spark SQL</a>.</p></div>
         
         
     </div>
diff --git a/flink/index.html b/flink/index.html
index fac1598..a1a4a34 100644
--- a/flink/index.html
+++ b/flink/index.html
@@ -808,7 +808,7 @@ For an unpartitioned iceberg table, its data will be 
completely overwritten by <
 <h3 id="batch-read">Batch Read<a class="headerlink" href="#batch-read" 
title="Permanent link">&para;</a></h3>
 <p>This example will read all records from iceberg table and then print to the 
stdout console in flink batch job:</p>
 <pre><code class="language-java">StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createLocalEnvironment();
-TableLoader tableLoader = 
TableLoader.fromHadooptable(&quot;hdfs://nn:8020/warehouse/path&quot;);
+TableLoader tableLoader = 
TableLoader.fromHadoopTable(&quot;hdfs://nn:8020/warehouse/path&quot;);
 DataStream&lt;RowData&gt; batch = FlinkSource.forRowData()
      .env(env)
      .tableLoader(tableLoader)
diff --git a/sitemap.xml b/sitemap.xml
index e5bfae3..8706633 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,177 +2,182 @@
 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9";>
     <url>
          <loc>https://iceberg.apache.org/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/api/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/aws/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
+         <changefreq>daily</changefreq>
+    </url>
+    <url>
+         <loc>https://iceberg.apache.org/benchmarks/</loc>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/blogs/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/community/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/configuration/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/custom-catalog/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/evolution/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/flink/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/getting-started/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/hive/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/how-to-release/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/java-api-quickstart/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/maintenance/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/nessie/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/partitioning/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/performance/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/python-api-intro/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/python-feature-support/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/python-quickstart/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/releases/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/reliability/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/schemas/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/snapshots/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/spark-configuration/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/spark-ddl/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/spark-procedures/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/spark-queries/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/spark-structured-streaming/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/spark-writes/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/spec/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/terms/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/trademarks/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/trino/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
     <url>
          <loc>https://iceberg.apache.org/why-iceberg/</loc>
-         <lastmod>2021-06-29</lastmod>
+         <lastmod>2021-07-12</lastmod>
          <changefreq>daily</changefreq>
     </url>
 </urlset>
\ No newline at end of file
diff --git a/sitemap.xml.gz b/sitemap.xml.gz
index f94fae8..d8a4130 100644
Binary files a/sitemap.xml.gz and b/sitemap.xml.gz differ
diff --git a/spark-structured-streaming/index.html 
b/spark-structured-streaming/index.html
index 3aedc93..5105c21 100644
--- a/spark-structured-streaming/index.html
+++ b/spark-structured-streaming/index.html
@@ -476,12 +476,12 @@ data.writeStream
 <li><code>append</code>: appends the rows of every micro-batch to the 
table</li>
 <li><code>complete</code>: replaces the table contents every micro-batch</li>
 </ul>
-<p>The table should be created in prior to start the streaming query. Refer <a 
href="/spark/#create-table">SQL create table</a>
+<p>The table should be created in prior to start the streaming query. Refer <a 
href="/spark-ddl/#create-table">SQL create table</a>
 on Spark page to see how to create the Iceberg table.</p>
 <h3 id="writing-against-partitioned-table">Writing against partitioned table<a 
class="headerlink" href="#writing-against-partitioned-table" title="Permanent 
link">&para;</a></h3>
 <p>Iceberg requires the data to be sorted according to the partition spec per 
task (Spark partition) in prior to write
 against partitioned table. For batch queries you&rsquo;re encouraged to do 
explicit sort to fulfill the requirement
-(see <a href="/spark/#writing-against-partitioned-table">here</a>), but the 
approach would bring additional latency as
+(see <a href="/spark-writes/#writing-to-partitioned-tables">here</a>), but the 
approach would bring additional latency as
 repartition and sort are considered as heavy operations for streaming 
workload. To avoid additional latency, you can
 enable fanout writer to eliminate the requirement.</p>
 <pre><code class="language-scala">val tableIdentifier: String = ...
diff --git a/spec/index.html b/spec/index.html
index 5a29796..23f1b21 100644
--- a/spec/index.html
+++ b/spec/index.html
@@ -417,6 +417,7 @@
             <li class="second-level"><a 
href="#specification">Specification</a></li>
                 
                 <li class="third-level"><a href="#terms">Terms</a></li>
+                <li class="third-level"><a href="#writer-requirements">Writer 
requirements</a></li>
                 <li class="third-level"><a 
href="#schemas-and-data-types">Schemas and Data Types</a></li>
                 <li class="third-level"><a 
href="#partitioning">Partitioning</a></li>
                 <li class="third-level"><a href="#sorting">Sorting</a></li>
@@ -524,6 +525,83 @@
 <li><strong>Data file</strong> &ndash; A file that contains rows of a 
table.</li>
 <li><strong>Delete file</strong> &ndash; A file that encodes rows of a table 
that are deleted by position or data values.</li>
 </ul>
+<h4 id="writer-requirements">Writer requirements<a class="headerlink" 
href="#writer-requirements" title="Permanent link">&para;</a></h4>
+<p>Some tables in this spec have columns that specify requirements for v1 and 
v2 tables. These requirements are intended for writers when adding metadata 
files to a table with the given version.</p>
+<table>
+<thead>
+<tr>
+<th>Requirement</th>
+<th>Write behavior</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td>(blank)</td>
+<td>The field should be omitted</td>
+</tr>
+<tr>
+<td><em>optional</em></td>
+<td>The field can be written</td>
+</tr>
+<tr>
+<td><em>required</em></td>
+<td>The field must be written</td>
+</tr>
+</tbody>
+</table>
+<p>Readers should be more permissive because v1 metadata files are allowed in 
v2 tables so that tables can be upgraded to v2 without rewriting the metadata 
tree. For manifest list and manifest files, this table shows the expected v2 
read behavior:</p>
+<table>
+<thead>
+<tr>
+<th>v1</th>
+<th>v2</th>
+<th>v2 read behavior</th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td></td>
+<td><em>optional</em></td>
+<td>Read the field as <em>optional</em></td>
+</tr>
+<tr>
+<td></td>
+<td><em>required</em></td>
+<td>Read the field as <em>optional</em>; it may be missing in v1 files</td>
+</tr>
+<tr>
+<td><em>optional</em></td>
+<td></td>
+<td>Ignore the field</td>
+</tr>
+<tr>
+<td><em>optional</em></td>
+<td><em>optional</em></td>
+<td>Read the field as <em>optional</em></td>
+</tr>
+<tr>
+<td><em>optional</em></td>
+<td><em>required</em></td>
+<td>Read the field as <em>optional</em>; it may be missing in v1 files</td>
+</tr>
+<tr>
+<td><em>required</em></td>
+<td></td>
+<td>Ignore the field</td>
+</tr>
+<tr>
+<td><em>required</em></td>
+<td><em>optional</em></td>
+<td>Read the field as <em>optional</em></td>
+</tr>
+<tr>
+<td><em>required</em></td>
+<td><em>required</em></td>
+<td>Fill in a default or throw an exception if the field is missing</td>
+</tr>
+</tbody>
+</table>
+<p>Readers may be more strict for metadata JSON files because the JSON files 
are not reused and will always match the table version. Required v2 fields that 
were not present in v1 or optional in v1 may be handled as required fields. For 
example, a v2 table that is missing <code>last-sequence-number</code> can throw 
an exception.</p>
 <h3 id="schemas-and-data-types">Schemas and Data Types<a class="headerlink" 
href="#schemas-and-data-types" title="Permanent link">&para;</a></h3>
 <p>A table&rsquo;s <strong>schema</strong> is a list of named columns. All 
data types are either primitives or nested types, which are maps, lists, or 
structs. A table schema is also a struct type.</p>
 <p>For the representations of these types in Avro, ORC, and Parquet file 
formats, see Appendix A.</p>
@@ -2619,27 +2697,83 @@ Hash results are not dependent on decimal scale, which 
is part of the type, not
 <h3 id="version-2">Version 2<a class="headerlink" href="#version-2" 
title="Permanent link">&para;</a></h3>
 <p>Writing v1 metadata:</p>
 <ul>
-<li>Table metadata field <code>last-sequence-number</code> should not be 
written.</li>
-<li>Snapshot field <code>sequence-number</code> should not be written.</li>
+<li>Table metadata field <code>last-sequence-number</code> should not be 
written</li>
+<li>Snapshot field <code>sequence-number</code> should not be written</li>
+<li>Manifest list field <code>sequence-number</code> should not be written</li>
+<li>Manifest list field <code>min-sequence-number</code> should not be 
written</li>
+<li>Manifest list field <code>content</code> must be 0 (data) or omitted</li>
+<li>Manifest entry field <code>sequence_number</code> should not be 
written</li>
+<li>Data file field <code>content</code> must be 0 (data) or omitted</li>
 </ul>
-<p>Reading v1 metadata:</p>
+<p>Reading v1 metadata for v2:</p>
 <ul>
-<li>Table metadata field <code>last-sequence-number</code> must default to 
0.</li>
-<li>Snapshot field <code>sequence-number</code> must default to 0.</li>
+<li>Table metadata field <code>last-sequence-number</code> must default to 
0</li>
+<li>Snapshot field <code>sequence-number</code> must default to 0</li>
+<li>Manifest list field <code>sequence-number</code> must default to 0</li>
+<li>Manifest list field <code>min-sequence-number</code> must default to 0</li>
+<li>Manifest list field <code>content</code> must default to 0 (data)</li>
+<li>Manifest entry field <code>sequence_number</code> must default to 0</li>
+<li>Data file field <code>content</code> must default to 0 (data)</li>
 </ul>
 <p>Writing v2 metadata:</p>
 <ul>
-<li>Table metadata added required field <code>last-sequence-number</code>.</li>
-<li>Table metadata now requires field <code>table-uuid</code>.</li>
-<li>Table metadata now requires field <code>partition-specs</code>.</li>
-<li>Table metadata now requires field <code>default-spec-id</code>.</li>
-<li>Table metadata now requires field <code>last-partition-id</code>.</li>
-<li>Table metadata field <code>partition-spec</code> is no longer required and 
may be omitted.</li>
-<li>Snapshot added required field <code>sequence-number</code>.</li>
-<li>Snapshot now requires field <code>manifest-list</code>.</li>
-<li>Snapshot field <code>manifests</code> is no longer allowed.</li>
-<li>Table metadata now requires field <code>sort-orders</code>.</li>
-<li>Table metadata now requires field <code>default-sort-order-id</code>.</li>
+<li>Table metadata JSON:<ul>
+<li><code>last-sequence-number</code> was added and is required; default to 0 
when reading v1 metadata</li>
+<li><code>table-uuid</code> is now required</li>
+<li><code>current-schema-id</code> is now required</li>
+<li><code>schemas</code> is now required</li>
+<li><code>partition-specs</code> is now required</li>
+<li><code>default-spec-id</code> is now required</li>
+<li><code>last-partition-id</code> is now required</li>
+<li><code>sort-orders</code> is now required</li>
+<li><code>default-sort-order-id</code> is now required</li>
+<li><code>schema</code> is no longer required and should be omitted; use 
<code>schemas</code> and <code>current-schema-id</code> instead</li>
+<li><code>partition-spec</code> is no longer required and should be omitted; 
use <code>partition-specs</code> and <code>default-spec-id</code> instead</li>
+</ul>
+</li>
+<li>Snapshot JSON:<ul>
+<li><code>sequence-number</code> was added and is required; default to 0 when 
reading v1 metadata</li>
+<li><code>manifest-list</code> is now required</li>
+<li><code>manifests</code> is no longer required and should be omitted; always 
use <code>manifest-list</code> instead</li>
+</ul>
+</li>
+<li>Manifest list <code>manifest_file</code>:<ul>
+<li><code>content</code> was added and is required; 0=data, 1=deletes; default 
to 0 when reading v1 manifest lists</li>
+<li><code>sequence_number</code> was added and is required</li>
+<li><code>min_sequence_number</code> was added and is required</li>
+<li><code>added_files_count</code> is now required</li>
+<li><code>existing_files_count</code> is now required</li>
+<li><code>deleted_files_count</code> is now required</li>
+<li><code>added_rows_count</code> is now required</li>
+<li><code>existing_rows_count</code> is now required</li>
+<li><code>deleted_rows_count</code> is now required</li>
+</ul>
+</li>
+<li>Manifest list <code>field_summary</code>:<ul>
+<li><code>contains_nan</code> is now required</li>
+</ul>
+</li>
+<li>Manifest key-value metadata:<ul>
+<li><code>schema-id</code> is now required</li>
+<li><code>partition-spec-id</code> is now required</li>
+<li><code>format-version</code> is now required</li>
+<li><code>content</code> was added and is required (must be &ldquo;data&rdquo; 
or &ldquo;deletes&rdquo;)</li>
+</ul>
+</li>
+<li>Manifest <code>manifest_entry</code>:<ul>
+<li><code>snapshot_id</code> is now optional to support inheritance</li>
+<li><code>sequence_number</code> was added and is optional, to support 
inheritance</li>
+</ul>
+</li>
+<li>Manifest <code>data_file</code>:<ul>
+<li><code>content</code> was added and is required; 0=data, 1=position 
deletes, 2=equality deletes; default to 0 when reading v1 manifests</li>
+<li><code>equality_ids</code> was added, to be used for equality deletes 
only</li>
+<li><code>block_size_in_bytes</code> was removed (breaks v1 reader 
compatibility)</li>
+<li><code>file_ordinal</code> was removed</li>
+<li><code>sort_columns</code> was removed</li>
+<li><code>distinct_counts</code> was removed</li>
+</ul>
+</li>
 </ul>
 <p>Note that these requirements apply when writing data to a v2 table. Tables 
that are upgraded from v1 may contain metadata that does not follow these 
requirements. Implementations should remain backward-compatible with v1 
metadata requirements.</p></div>

[iceberg] branch asf-site updated: Deployed 25eaebacb with MkDocs version: 1.2.1

Reply via email to