[GitHub] [iceberg-docs] openinx commented on a change in pull request #27: Add 0.13.0 release note

GitBox Mon, 07 Feb 2022 19:26:14 -0800


openinx commented on a change in pull request #27:
URL: https://github.com/apache/iceberg-docs/pull/27#discussion_r801236608




##########
File path: landing-page/content/common/releases/release-notes.md
##########
@@ -62,10 +66,86 @@ To add a dependency on Iceberg in Maven, add the following 
to your `pom.xml`:
 </dependencies>
 ```
 
-## 0.12.1 Release Notes
+## 0.13.0 Release Notes
+
+Apache Iceberg 0.13.0 was released on February 4th, 2022.
+
+**High-level features:**
+
+* **Core**
+  * Catalog caching now supports cache expiration through catalog property 
`cache.expiration-interval-ms` 
[[\#3543](https://github.com/apache/iceberg/pull/3543)]
+  * Catalog now supports registration of Iceberg table from a given metadata 
file location [[\#3851](https://github.com/apache/iceberg/pull/3851)]
+  * Hadoop catalog now supports atomic commit using a lock manager 
[[\#3663](https://github.com/apache/iceberg/pull/3663)]
+* **Vendor Integrations**
+  * Google Cloud Storage (GCS) `FileIO` support is added with optimized read 
and write using GCS streaming transfer 
[[\#3711](https://github.com/apache/iceberg/pull/3711)]
+  * Aliyun Object Storage Service (OSS) `FileIO` support is added 
[[\#3553](https://github.com/apache/iceberg/pull/3553)]
+  * Any S3-compatible storage (e.g. MinIO) can now be accessed through AWS 
`S3FileIO` with custom endpoint and credential configurations 
[[\#3656](https://github.com/apache/iceberg/pull/3656)] 
[[\#3658](https://github.com/apache/iceberg/pull/3658)]
+  * AWS `S3FileIO` now supports server-side checksum validation 
[[\#3813](https://github.com/apache/iceberg/pull/3813)]
+  * AWS `GlueCatalog` now displays more table information including location, 
description [[\#3467](https://github.com/apache/iceberg/pull/3467)] and used 
columns [[\#3888](https://github.com/apache/iceberg/pull/3888)]
+  * `ResolvingFileIO` is added to support using multiple `FileIO`s to access 
different storage providers based on file scheme. 
[[\#3593](https://github.com/apache/iceberg/pull/3593)]
+* **File Formats**
+  * Reading legacy Parquet file (e.g. produced by `ParquetHiveSerDe` or Spark 
`spark.sql.parquet.writeLegacyFormat=true`) is now fully supported  to 
facilitate Hive to Iceberg table migration 
[[\#3723](https://github.com/apache/iceberg/pull/3723)]
+  * ORC merge-on-read file write support is added 
[[\#3248](https://github.com/apache/iceberg/pull/3248)] 
[[\#3250](https://github.com/apache/iceberg/pull/3250)] 
[[\#3366](https://github.com/apache/iceberg/pull/3366)]
+* **Spark**
+  * Spark 3.2 support is added 
[[\#3335](https://github.com/apache/iceberg/pull/3335)] with merge-on-read 
`DELETE` [[\#3970](https://github.com/apache/iceberg/pull/3970)]
+  * `RewriteDataFiles` action now supports sort-based table optimization 
[[\#2829](https://github.com/apache/iceberg/pull/2829)] and merge-on-read 
delete compaction [[\#3454](https://github.com/apache/iceberg/pull/3454)]. The 
corresponding Spark call procedure `rewrite_data_files` is also added 
[[\#3375](https://github.com/apache/iceberg/pull/3375)]
+  * Snapshot schema is now used instead of the table's latest schema 
[[\#3722](https://github.com/apache/iceberg/pull/3722)]
+  * Spark vectorized reads now support row-level deletes 
[[\#3557](https://github.com/apache/iceberg/pull/3557)] 
[[\#3287](https://github.com/apache/iceberg/pull/3287)]
+  * `add_files` procedure now skips duplicated files by default (can be turned 
off with the `check_duplicate_files` flag) 
[[\#2895](https://github.com/apache/iceberg/issues/2779)], skips folder without 
file [[\#2895](https://github.com/apache/iceberg/issues/3455)] and partitions 
with `null` values [[\#2895](https://github.com/apache/iceberg/issues/3778)] 
instead of throwing exception, and supports partition pruning for faster table 
import [[\#3745](https://github.com/apache/iceberg/issues/3745)]
+* **Flink**
+  * Flink 1.13 and 1.14 supports are added 
[[\#3116](https://github.com/apache/iceberg/pull/3116)] 
[[\#3434](https://github.com/apache/iceberg/pull/3434)]
+  * Flink connector support is added 
[[\#2666](https://github.com/apache/iceberg/pull/2666)]
+  * Upsert write option is added 
[[\#2863](https://github.com/apache/iceberg/pull/2863)]
+* **Hive**
+  * Table listing in Hive catalog can now skip non-Iceberg tables by disabling 
flag `list-all-tables` [[\#3908](https://github.com/apache/iceberg/pull/3908)]
+  * Hive tables imported to Iceberg can now be read by `IcebergInputFormat` 
[[\#3312](https://github.com/apache/iceberg/pull/3312)]
+  
+**Important bug fixes:**
+
+* **Core**
+  * Iceberg new data file root path is configured through `write.data.path` 
going forward. `write.folder-storage.path` and `write.object-storage.path` are 
deprecated [[\#3094](https://github.com/apache/iceberg/pull/3094)]
+  * Catalog commit status is `UNKNOWN` instead of `FAILURE` when new metadata 
location cannot be found in snapshot history 
[[\#3717](https://github.com/apache/iceberg/pull/3717)]
+  * Dropping table now also deletes old metadata files instead of leaving them 
strained [[\#3622](https://github.com/apache/iceberg/pull/3622)]
+  * `history` and `snapshots` metadata tables can now query tables with no 
current snapshot instead of returning empty 
[[\#3812](https://github.com/apache/iceberg/pull/3812)]
+* **Vendor Integrations**
+  * Using cloud service integrations such as AWS `GlueCatalog` and `S3FileIO` 
no longer fail when missing Hadoop dependencies in the execution environment 
[[\#3590](https://github.com/apache/iceberg/pull/3590)]
+  * AWS clients are now auto-closed when `FileIO` or `Catalog` is closed. 
There is no need to close the AWS clients separately 
[[\#2878](https://github.com/apache/iceberg/pull/2878)]
+* **File Formats**
+  * Parquet file writing issue is fixed for string data with over 16 
unparseable chars (e.g. high/low surrogates) 
[[\#3760](https://github.com/apache/iceberg/pull/3760)]
+  * ORC vectorized read is now configured using 
`read.orc.vectorization.batch-size` instead of 
`read.parquet.vectorization.batch-size`  
[[\#3133](https://github.com/apache/iceberg/pull/3133)]
+* **Spark**
+  * For Spark >= 3.1, `REFRESH TABLE` can now be used with Spark session 
catalog instead of throwing exception 
[[\#3072](https://github.com/apache/iceberg/pull/3072)]
+  * Insert overwrite mode now skips partition with 0 records instead of 
throwing exception [[\#2895](https://github.com/apache/iceberg/issues/2895)]
+  * Spark snapshot expiration now supports custom `FileIO` instead of just 
`HadoopFileIO` [[\#3089](https://github.com/apache/iceberg/pull/3089)]
+  * `REPLACE TABLE AS SELECT` can now work with tables with columns that have 
changed partition transform. Each old partition field of the same column is 
converted to a void transform with a different name 
[[\#3421](https://github.com/apache/iceberg/issues/3421)]
+  * Spark SQL statements containing binary or fixed literals can now be parsed 
correctly instead of throwing exception 
[[\#3728](https://github.com/apache/iceberg/pull/3728)]
+* **Flink**

Review comment:
       I think we missed a critical bug fix here: 
https://github.com/apache/iceberg/pull/3540

##########
File path: landing-page/content/common/releases/release-notes.md
##########
@@ -25,14 +25,18 @@ url: releases
 The latest version of Iceberg is [{{% icebergVersion 
%}}](https://github.com/apache/iceberg/releases/tag/apache-iceberg-{{% 
icebergVersion %}}).
 
 * [{{% icebergVersion %}} source 
tar.gz](https://www.apache.org/dyn/closer.cgi/iceberg/apache-iceberg-{{% 
icebergVersion %}}/apache-iceberg-{{% icebergVersion %}}.tar.gz) -- 
[signature](https://downloads.apache.org/iceberg/apache-iceberg-{{% 
icebergVersion %}}/apache-iceberg-{{% icebergVersion %}}.tar.gz.asc) -- 
[sha512](https://downloads.apache.org/iceberg/apache-iceberg-{{% icebergVersion 
%}}/apache-iceberg-{{% icebergVersion %}}.tar.gz.sha512)
+* [{{% icebergVersion %}} Spark 3.2 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.2_2.12/{{%
 icebergVersion %}}/iceberg-spark-runtime-3.2_2.12-{{% icebergVersion %}}.jar)
+* [{{% icebergVersion %}} Spark 3.1 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime-3.1_2.12/{{%
 icebergVersion %}}/iceberg-spark-runtime-3.1_2.12-{{% icebergVersion %}}.jar)
 * [{{% icebergVersion %}} Spark 3.0 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark3-runtime/{{%
 icebergVersion %}}/iceberg-spark3-runtime-{{% icebergVersion %}}.jar)
 * [{{% icebergVersion %}} Spark 2.4 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-spark-runtime/{{%
 icebergVersion %}}/iceberg-spark-runtime-{{% icebergVersion %}}.jar)
-* [{{% icebergVersion %}} Flink runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime/{{%
 icebergVersion %}}/iceberg-flink-runtime-{{% icebergVersion %}}.jar)
+* [{{% icebergVersion %}} Flink 1.14 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.14/{{%
 icebergVersion %}}/iceberg-flink-runtime-1.14-{{% icebergVersion %}}.jar)
+* [{{% icebergVersion %}} Flink 1.13 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.13/{{%
 icebergVersion %}}/iceberg-flink-runtime-1.13-{{% icebergVersion %}}.jar)
+* [{{% icebergVersion %}} Flink 1.12 runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-flink-runtime-1.12/{{%
 icebergVersion %}}/iceberg-flink-runtime-1.12-{{% icebergVersion %}}.jar)
 * [{{% icebergVersion %}} Hive runtime 
Jar](https://search.maven.org/remotecontent?filepath=org/apache/iceberg/iceberg-hive-runtime/{{%
 icebergVersion %}}/iceberg-hive-runtime-{{% icebergVersion %}}.jar)
 
-To use Iceberg in Spark, download the runtime JAR and add it to the jars 
folder of your Spark install. Use iceberg-spark3-runtime for Spark 3, and 
iceberg-spark-runtime for Spark 2.4.
+To use Iceberg in Spark or Flink, download the runtime JAR for your engine 
version and add it to the jars folder of your installation.
 
-To use Iceberg in Hive, download the iceberg-hive-runtime JAR and add it to 
Hive using `ADD JAR`.
+To use Iceberg in Hive, download the Hive runtime JAR and add it to Hive using 
`ADD JAR`.

Review comment:
       Should we add a sentence to show users that both hive2 and hive3 are 
using the same hive runtime jar ?  I see both spark & flink have provided their 
version specific runtime jar but hive only provided the single shared jar.

##########
File path: landing-page/content/common/project/multi-engine-support.md
##########
@@ -0,0 +1,93 @@
+---
+title: "Multi-Engine Support"
+bookHidden: true
+url: multi-engine-support
+---
+<!--
+ - Licensed to the Apache Software Foundation (ASF) under one or more
+ - contributor license agreements.  See the NOTICE file distributed with
+ - this work for additional information regarding copyright ownership.
+ - The ASF licenses this file to You under the Apache License, Version 2.0
+ - (the "License"); you may not use this file except in compliance with
+ - the License.  You may obtain a copy of the License at
+ -
+ -   http://www.apache.org/licenses/LICENSE-2.0
+ -
+ - Unless required by applicable law or agreed to in writing, software
+ - distributed under the License is distributed on an "AS IS" BASIS,
+ - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ - See the License for the specific language governing permissions and
+ - limitations under the License.
+ -->
+
+# Multi-Engine Support
+
+Multi-engine support is a core tenant of Apache Iceberg.
+The community continuously improves Iceberg core library components to enable 
integrations with different compute engines that power analytics, business 
intelligence, machine learning, etc.
+Support of [Apache Spark](../../../docs/spark-configuration), [Apache 
Flink](../../../docs/flink) and [Apache Hive](../../../docs/hive) are provided 
inside the Iceberg main repository.
+
+## Multi-Version Support
+
+Engines maintained within the Iceberg repository have multi-version support.
+This means each new version of an engine that introduces backwards 
incompatible upgrade has its dedicated integration codebase and release 
artifacts.
+For example, the code for Iceberg Spark 3.1 integration is under 
`/spark/v3.1`, and for Iceberg Spark 3.2 integration is under `/spark/v3.2`,
+Different artifacts (`iceberg-spark-3.1_2.12` and `iceberg-spark-3.2_2.12`) 
are released for users to consume.

Review comment:
       If spark 2.4 & 3.0 also follow the  3.1 & 3.2 naming approach,  then 
this sentence should be correct. That's why I raise this question before: 
https://github.com/apache/iceberg-docs/pull/27/files#r800297155




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg-docs] openinx commented on a change in pull request #27: Add 0.13.0 release note

Reply via email to