[GitHub] [iceberg-docs] jackye1995 commented on a change in pull request #27: Add 0.13.0 release note

GitBox Mon, 07 Feb 2022 19:33:08 -0800


jackye1995 commented on a change in pull request #27:
URL: https://github.com/apache/iceberg-docs/pull/27#discussion_r801239627




##########
File path: landing-page/content/common/releases/release-notes.md
##########
@@ -62,10 +66,86 @@ To add a dependency on Iceberg in Maven, add the following 
to your `pom.xml`:
 </dependencies>
 ```
 
-## 0.12.1 Release Notes
+## 0.13.0 Release Notes
+
+Apache Iceberg 0.13.0 was released on February 4th, 2022.
+
+**High-level features:**
+
+* **Core**
+  * Catalog caching now supports cache expiration through catalog property 
`cache.expiration-interval-ms` 
[[\#3543](https://github.com/apache/iceberg/pull/3543)]
+  * Catalog now supports registration of Iceberg table from a given metadata 
file location [[\#3851](https://github.com/apache/iceberg/pull/3851)]
+  * Hadoop catalog now supports atomic commit using a lock manager 
[[\#3663](https://github.com/apache/iceberg/pull/3663)]
+* **Vendor Integrations**
+  * Google Cloud Storage (GCS) `FileIO` support is added with optimized read 
and write using GCS streaming transfer 
[[\#3711](https://github.com/apache/iceberg/pull/3711)]
+  * Aliyun Object Storage Service (OSS) `FileIO` support is added 
[[\#3553](https://github.com/apache/iceberg/pull/3553)]
+  * Any S3-compatible storage (e.g. MinIO) can now be accessed through AWS 
`S3FileIO` with custom endpoint and credential configurations 
[[\#3656](https://github.com/apache/iceberg/pull/3656)] 
[[\#3658](https://github.com/apache/iceberg/pull/3658)]
+  * AWS `S3FileIO` now supports server-side checksum validation 
[[\#3813](https://github.com/apache/iceberg/pull/3813)]
+  * AWS `GlueCatalog` now displays more table information including location, 
description [[\#3467](https://github.com/apache/iceberg/pull/3467)] and used 
columns [[\#3888](https://github.com/apache/iceberg/pull/3888)]
+  * `ResolvingFileIO` is added to support using multiple `FileIO`s to access 
different storage providers based on file scheme. 
[[\#3593](https://github.com/apache/iceberg/pull/3593)]
+* **File Formats**
+  * Reading legacy Parquet file (e.g. produced by `ParquetHiveSerDe` or Spark 
`spark.sql.parquet.writeLegacyFormat=true`) is now fully supported  to 
facilitate Hive to Iceberg table migration 
[[\#3723](https://github.com/apache/iceberg/pull/3723)]
+  * ORC merge-on-read file write support is added 
[[\#3248](https://github.com/apache/iceberg/pull/3248)] 
[[\#3250](https://github.com/apache/iceberg/pull/3250)] 
[[\#3366](https://github.com/apache/iceberg/pull/3366)]
+* **Spark**
+  * Spark 3.2 support is added 
[[\#3335](https://github.com/apache/iceberg/pull/3335)] with merge-on-read 
`DELETE` [[\#3970](https://github.com/apache/iceberg/pull/3970)]
+  * `RewriteDataFiles` action now supports sort-based table optimization 
[[\#2829](https://github.com/apache/iceberg/pull/2829)] and merge-on-read 
delete compaction [[\#3454](https://github.com/apache/iceberg/pull/3454)]. The 
corresponding Spark call procedure `rewrite_data_files` is also added 
[[\#3375](https://github.com/apache/iceberg/pull/3375)]
+  * Snapshot schema is now used instead of the table's latest schema 
[[\#3722](https://github.com/apache/iceberg/pull/3722)]
+  * Spark vectorized reads now support row-level deletes 
[[\#3557](https://github.com/apache/iceberg/pull/3557)] 
[[\#3287](https://github.com/apache/iceberg/pull/3287)]
+  * `add_files` procedure now skips duplicated files by default (can be turned 
off with the `check_duplicate_files` flag) 
[[\#2895](https://github.com/apache/iceberg/issues/2779)], skips folder without 
file [[\#2895](https://github.com/apache/iceberg/issues/3455)] and partitions 
with `null` values [[\#2895](https://github.com/apache/iceberg/issues/3778)] 
instead of throwing exception, and supports partition pruning for faster table 
import [[\#3745](https://github.com/apache/iceberg/issues/3745)]
+* **Flink**
+  * Flink 1.13 and 1.14 supports are added 
[[\#3116](https://github.com/apache/iceberg/pull/3116)] 
[[\#3434](https://github.com/apache/iceberg/pull/3434)]
+  * Flink connector support is added 
[[\#2666](https://github.com/apache/iceberg/pull/2666)]
+  * Upsert write option is added 
[[\#2863](https://github.com/apache/iceberg/pull/2863)]
+* **Hive**
+  * Table listing in Hive catalog can now skip non-Iceberg tables by disabling 
flag `list-all-tables` [[\#3908](https://github.com/apache/iceberg/pull/3908)]
+  * Hive tables imported to Iceberg can now be read by `IcebergInputFormat` 
[[\#3312](https://github.com/apache/iceberg/pull/3312)]
+  
+**Important bug fixes:**
+
+* **Core**
+  * Iceberg new data file root path is configured through `write.data.path` 
going forward. `write.folder-storage.path` and `write.object-storage.path` are 
deprecated [[\#3094](https://github.com/apache/iceberg/pull/3094)]
+  * Catalog commit status is `UNKNOWN` instead of `FAILURE` when new metadata 
location cannot be found in snapshot history 
[[\#3717](https://github.com/apache/iceberg/pull/3717)]
+  * Dropping table now also deletes old metadata files instead of leaving them 
strained [[\#3622](https://github.com/apache/iceberg/pull/3622)]
+  * `history` and `snapshots` metadata tables can now query tables with no 
current snapshot instead of returning empty 
[[\#3812](https://github.com/apache/iceberg/pull/3812)]
+* **Vendor Integrations**
+  * Using cloud service integrations such as AWS `GlueCatalog` and `S3FileIO` 
no longer fail when missing Hadoop dependencies in the execution environment 
[[\#3590](https://github.com/apache/iceberg/pull/3590)]
+  * AWS clients are now auto-closed when `FileIO` or `Catalog` is closed. 
There is no need to close the AWS clients separately 
[[\#2878](https://github.com/apache/iceberg/pull/2878)]
+* **File Formats**
+  * Parquet file writing issue is fixed for string data with over 16 
unparseable chars (e.g. high/low surrogates) 
[[\#3760](https://github.com/apache/iceberg/pull/3760)]
+  * ORC vectorized read is now configured using 
`read.orc.vectorization.batch-size` instead of 
`read.parquet.vectorization.batch-size`  
[[\#3133](https://github.com/apache/iceberg/pull/3133)]
+* **Spark**
+  * For Spark >= 3.1, `REFRESH TABLE` can now be used with Spark session 
catalog instead of throwing exception 
[[\#3072](https://github.com/apache/iceberg/pull/3072)]
+  * Insert overwrite mode now skips partition with 0 records instead of 
throwing exception [[\#2895](https://github.com/apache/iceberg/issues/2895)]
+  * Spark snapshot expiration now supports custom `FileIO` instead of just 
`HadoopFileIO` [[\#3089](https://github.com/apache/iceberg/pull/3089)]
+  * `REPLACE TABLE AS SELECT` can now work with tables with columns that have 
changed partition transform. Each old partition field of the same column is 
converted to a void transform with a different name 
[[\#3421](https://github.com/apache/iceberg/issues/3421)]
+  * Spark SQL statements containing binary or fixed literals can now be parsed 
correctly instead of throwing exception 
[[\#3728](https://github.com/apache/iceberg/pull/3728)]
+* **Flink**

Review comment:
       yes agree, I removed it thinking it might be too much detail to mention, 
let me add it back.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg-docs] jackye1995 commented on a change in pull request #27: Add 0.13.0 release note

Reply via email to