This is an automated email from the ASF dual-hosted git repository.
sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new f698fb7 [HUDI-2382] Add Hudi 0.9.0 release page with highlights
(#3547)
f698fb7 is described below
commit f698fb7e3de08e9d5c6c80610cf62f630d8d855c
Author: Udit Mehrotra <[email protected]>
AuthorDate: Tue Aug 31 06:23:58 2021 -0700
[HUDI-2382] Add Hudi 0.9.0 release page with highlights (#3547)
Co-authored-by: Sivabalan Narayanan <[email protected]>
Co-authored-by: Vinoth Chandar <[email protected]>
---
website/docusaurus.config.js | 6 +-
website/releases/download.md | 4 ++
website/releases/older-releases.md | 2 +-
website/releases/release-0.5.3.md | 2 +-
website/releases/release-0.6.0.md | 2 +-
website/releases/release-0.7.0.md | 2 +-
website/releases/release-0.8.0.md | 2 +-
website/releases/release-0.9.0.md | 132 +++++++++++++++++++++++++++++++++++++
website/src/pages/index.js | 2 +-
9 files changed, 145 insertions(+), 9 deletions(-)
diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index 80f3785..72267fe 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -73,11 +73,11 @@ module.exports = {
},
{
from: ['/docs/releases', '/docs/next/releases'],
- to: '/releases/release-0.8.0',
+ to: '/releases/release-0.9.0',
},
{
from: ['/releases'],
- to: '/releases/release-0.8.0',
+ to: '/releases/release-0.9.0',
},
],
},
@@ -211,7 +211,7 @@ module.exports = {
},
{
label: 'Releases',
- to: '/releases/release-0.8.0',
+ to: '/releases/release-0.9.0',
},
{
label: 'Download',
diff --git a/website/releases/download.md b/website/releases/download.md
index 2da12c8..fc8e29e 100644
--- a/website/releases/download.md
+++ b/website/releases/download.md
@@ -6,6 +6,10 @@ toc: true
last_modified_at: 2019-12-30T15:59:57-04:00
---
+## Release 0.9.0
+* Source Release : [Apache Hudi 0.9.0 Source
Release](https://www.apache.org/dyn/closer.lua/hudi/0.9.0/hudi-0.9.0.src.tgz)
([asc](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.sha512))
+* Release Note : ([Release Note for Apache Hudi
0.9.0](/releases/release-0.9.0))
+
## Release 0.8.0
* Source Release : [Apache Hudi 0.8.0 Source
Release](https://www.apache.org/dyn/closer.lua/hudi/0.8.0/hudi-0.8.0.src.tgz)
([asc](https://downloads.apache.org/hudi/0.8.0/hudi-0.8.0.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.8.0/hudi-0.8.0.src.tgz.sha512))
* Release Note : ([Release Note for Apache Hudi
0.8.0](/releases/release-0.8.0))
diff --git a/website/releases/older-releases.md
b/website/releases/older-releases.md
index 52398d3..cf6592f 100644
--- a/website/releases/older-releases.md
+++ b/website/releases/older-releases.md
@@ -1,6 +1,6 @@
---
title: "Older Releases"
-sidebar_position: 6
+sidebar_position: 7
layout: releases
toc: true
last_modified_at: 2020-05-28T08:40:00-07:00
diff --git a/website/releases/release-0.5.3.md
b/website/releases/release-0.5.3.md
index 481822a..9a3af0b 100644
--- a/website/releases/release-0.5.3.md
+++ b/website/releases/release-0.5.3.md
@@ -1,6 +1,6 @@
---
title: "Release 0.5.3"
-sidebar_position: 5
+sidebar_position: 6
layout: releases
toc: true
last_modified_at: 2020-05-28T08:40:00-07:00
diff --git a/website/releases/release-0.6.0.md
b/website/releases/release-0.6.0.md
index afa978a..9db16a4 100644
--- a/website/releases/release-0.6.0.md
+++ b/website/releases/release-0.6.0.md
@@ -1,6 +1,6 @@
---
title: "Release 0.6.0"
-sidebar_position: 4
+sidebar_position: 5
layout: releases
toc: true
last_modified_at: 2020-05-28T08:40:00-07:00
diff --git a/website/releases/release-0.7.0.md
b/website/releases/release-0.7.0.md
index 4080380..6bf1324 100644
--- a/website/releases/release-0.7.0.md
+++ b/website/releases/release-0.7.0.md
@@ -1,6 +1,6 @@
---
title: "Release 0.7.0"
-sidebar_position: 3
+sidebar_position: 4
layout: releases
toc: true
last_modified_at: 2020-05-28T08:40:00-07:00
diff --git a/website/releases/release-0.8.0.md
b/website/releases/release-0.8.0.md
index f458e28..8a454c9 100644
--- a/website/releases/release-0.8.0.md
+++ b/website/releases/release-0.8.0.md
@@ -1,6 +1,6 @@
---
title: "Release 0.8.0"
-sidebar_position: 2
+sidebar_position: 3
layout: releases
toc: true
last_modified_at: 2020-05-28T08:40:00-07:00
diff --git a/website/releases/release-0.9.0.md
b/website/releases/release-0.9.0.md
new file mode 100644
index 0000000..7d7fb29
--- /dev/null
+++ b/website/releases/release-0.9.0.md
@@ -0,0 +1,132 @@
+---
+title: "Release 0.9.0"
+sidebar_position: 2
+layout: releases
+toc: true
+last_modified_at: 2021-08-26T08:40:00-07:00
+---
+# [Release 0.9.0](https://github.com/apache/hudi/releases/tag/release-0.9.0)
([docs](/docs/quick-start-guide))
+
+## Download Information
+* Source Release : [Apache Hudi 0.9.0 Source
Release](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz)
([asc](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.9.0/hudi-0.9.0.src.tgz.sha512))
+* Apache Hudi jars corresponding to this release is available
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+## Migration Guide for this release
+- If migrating from an older release, please also check the upgrade
instructions for each subsequent release below.
+- With 0.9.0, Hudi is adding more table properties to aid in using an existing
hudi table with spark-sql.
+ To smoothly aid this transition these properties added to
`hoodie.properties` file. Whenever Hudi is launched with
+ newer table version i.e 2 (or moving from pre 0.9.0 to 0.9.0), an upgrade
step will be executed automatically.
+ This automatic upgrade step will happen just once per Hudi table as the
`hoodie.table.version` will be updated in
+ property file after upgrade is completed.
+- Similarly, a command line tool for Downgrading (command - `downgrade`) is
added if in case some users want to
+ downgrade Hudi from table version `2` to `1` or move from Hudi 0.9.0 to pre
0.9.0. This needs to be executed from a
+ 0.9.0 `hudi-cli` binary/script.
+- With this release, we added a new framework to track config properties in
code, moving away from string variables that
+ hold property names and values. This move helps us automate configuration
doc generation and much more. While we still
+ support the older configs string variables, users are encouraged to use the
new `ConfigProperty` equivalents, as noted
+ in the deprecation notices. In most cases, it is as simple as calling
`.key()` and `.defaultValue()` on the corresponding
+ alternative. e.g `RECORDKEY_FIELD_OPT_KEY` can be replaced by
`RECORDKEY_FIELD_NAME.key()`
+
+## Release Highlights
+
+### Spark SQL DML and DDL Support
+
+0.9.0 adds **experimental** support for DDL/DMLs using Spark SQL, taking a
huge step towards making Hudi more easily accessible and
+operable by all personas (non-engineers, analysts etc). Users can now use
`CREATE TABLE....USING HUDI` and `CREATE TABLE .. AS SELECT`
+statements to directly create and manage tables in catalogs like Hive. Users
can then use `INSERT`, `UPDATE`, `MERGE INTO` and `DELETE`
+sql statements to manipulate data. In addition, `INSERT OVERWRITE` statement
can be used to overwrite existing data in the table or partition
+for existing batch ETL pipelines. For more information, checkout our docs
[here](/docs/quick-start-guide) clicking on `SparkSQL` tab.
+Please see
[RFC-25](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+25%3A+Spark+SQL+Extension+For+Hudi)
+for more implementation details.
+
+### Query side improvements
+
+Hudi tables are now registered with Hive as spark datasource tables, meaning
Spark SQL on these tables now uses the datasource as well,
+instead of relying on the Hive fallbacks within Spark, which are
ill-maintained/cumbersome. This unlocks many optimizations such as the
+use of Hudi's own
[FileIndex](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieFileIndex.scala#L46)
+implementation for optimized caching and the use of the Hudi metadata table,
for faster listing of large tables. We have also added support for
+[timetravel query](/docs/quick-start-guide#time-travel-query), for spark
datasource.
+
+### Writer side improvements
+
+Virtual keys support has been added where users can avoid adding meta fields
to hudi table and leverage existing fields to populate record keys and
partition paths.
+One needs to disable [this](/docs/configurations#hoodiepopulatemetafields)
config to enable virtual keys.
+
+Bulk Insert operations using [row writer
enabled](/docs/configurations#hoodiedatasourcewriterowwriterenable) now
supports pre-combining,
+sort modes and user defined partitioners and now turned on by default for fast
inserts.
+
+Hudi performs automatic cleanup of uncommitted data, which has now been
enhanced to be performant over cloud storage, even for
+extremely large tables. Specifically, a new marker mechanism has been
implemented leveraging the timeline server to perform
+centrally co-ordinated batched read/write of file markers to underlying
storage. You can turn this using this
[config](/docs/configurations#hoodiewritemarkerstype) and learn more
+about it on this [blog](/blog/2021/08/18/improving-marker-mechanism).
+
+Async Clustering support has been added to both DeltaStreamer and Spark
Structured Streaming Sink. More on this can be found in this
+[blog post](/blog/2021/08/23/async-clustering). In addition, we have added a
new utility class
[HoodieClusteringJob](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieClusteringJob.java)
+to assist in building and executing a clustering plan together as a standalone
spark job.
+
+Users can choose to drop fields used to generate partition paths, using
`hoodie.datasource.write.drop.partition.columns=true`, to support
+querying of Hudi snapshots using systems like BigQuery, which cannot handle
this.
+
+Hudi uses different [types of spillable
maps](http://localhost:3000/docs/configurations#hoodiecommonspillablediskmaptype),
for internally handling merges (compaction, updates or even MOR snapshot
queries). In 0.9.0, we have added
+support for
[compression](/docs/configurations#hoodiecommondiskmapcompressionenabled) for
the bitcask style default option and introduced a new spillable map backed by
rocksDB, which can be more performant for large
+bulk updates or working with large base file sizes.
+
+Added a new write operation `delete_partition` operation, with support in
spark. Users can leverage this to delete older partitions in bulk, in addition
to
+record level deletes. Deletion of specific partitions can be done using this
[config](/docs/configurations#hoodiedatasourcewritepartitionstodelete)
+
+Support for Huawei Cloud Object Storage, BAIDU AFS storage format, Baidu BOS
storage in Hudi.
+
+A [pre commit validator
framework](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SparkPreCommitValidator.java)
+has been added for spark engine, which can used for DeltaStreamer and Spark
Datasource writers. Users can leverage this to add any validations to be
executed before committing writes to Hudi.
+Three validators come out-of-box
+ -
[org.apache.hudi.client.validator.SqlQueryEqualityPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryEqualityPreCommitValidator.java)
can be used to validate for equality of rows before and after the commit.
+ -
[org.apache.hudi.client.validator.SqlQueryInequalityPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQueryInequalityPreCommitValidator.java)
can be used to validate for inequality of rows before and after the commit.
+ -
[org.apache.hudi.client.validator.SqlQuerySingleResultPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SqlQuerySingleResultPreCommitValidator.java)
can be used to validate that a query on the table results in a specific value.
+
+These can be configured by setting `hoodie.precommit.validators=<comma
separated list of validator class names>`. Users can also provide their own
implementations by extending the abstract class
[SparkPreCommitValidator](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/client/validator/SparkPreCommitValidator.java)
+and overriding this method
+
+```java
+void validateRecordsBeforeAndAfter(Dataset<Row> before,
+ Dataset<Row> after,
+ Set<String> partitionsAffected)
+```
+
+
+### Flink Integration Improvements
+
+The Flink writer now supports propagation of CDC format for MOR table, by
turning on the option `changelog.enabled=true`. Hudi would then persist all
change flags of each record,
+using the streaming reader of Flink, user can do stateful computation based on
these change logs. Note that when the table is compacted with async compaction
service, all the
+intermediate changes are merged into one(last record), to only have UPSERT
semantics.
+
+Flink writing now also has most feature parity with spark writing, with
addition of write operations like `bulk_insert`, `insert_overwrite`, support
for non-partitioned tables,
+automatic cleanup of uncommitted data, global indexing support, hive style
partitioning and handling of partition path updates. Writing also supports a
new log append mode, where
+no records are de-duplicated and base files are directly written for each
flush. To use this mode, set `write.insert.deduplicate=false`.
+
+Flink readers now support streaming reads from COW/MOR tables. Deletions are
emitted by default in streaming read mode, the downstream receives the DELETE
message as a Hoodie record with empty payload.
+
+Hive sync has been greatly improved by support different Hive versions(1.x,
2.x, 3.x). Hive sync can also now be done asynchronously.
+
+Flink Streamer tool now supports transformers.
+
+### DeltaStreamer
+
+We have enhanced Deltastreamer utility with 3 new sources.
+
+[JDBC
source](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/JdbcSource.java)
can take a extraction SQL statement and
+incrementally fetch data out of sources supporting JDBC. This can be useful
for e.g when reading data from RDBMS sources. Note that, this approach may need
periodic re-bootstrapping to ensure data consistency, although being much
simpler to operate over CDC based approaches.
+
+[SQLSource](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/SqlSource.java)
takes a Spark SQL statement to fetch data out of existing tables and
+can be very useful for easy SQL based backfills use-cases e.g: backfilling
just one column for the past N months.
+
+[S3EventsHoodieIncrSource](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java)
and
[S3EventsSource](https://github.com/apache/hudi/blob/bf5a52e51bbeaa089995335a0a4c55884792e505/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsSource.java)
+assist in reading data from S3 reliably and efficiently ingesting that to
Hudi. Existing approach using `*DFSSource` source classes uses last
modification time of files as checkpoint to pull in new files.
+But, if large number of files have the same modification time, this might miss
some files to be read from the source. These two sources
(S3EventsHoodieIncrSource and S3EventsSource) together ensures data
+is reliably ingested from S3 into Hudi by leveraging AWS SNS and SQS services
that subscribes to file events from the source bucket. [This blog
post](/blog/2021/08/23/s3-events-source) presents a model for
+scalable, reliable incremental ingestion by using these two sources in tandem.
+
+In addition to pulling events from kafka using regular offset format, we also
added support for timestamp based fetches, that can
+help with initial backfill/bootstrap scenarios. We have also added support for
passing in basic auth credentials in schema registry provider url with schema
provider.
+
+## Raw Release Notes
+The raw release notes are available
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12350027)
\ No newline at end of file
diff --git a/website/src/pages/index.js b/website/src/pages/index.js
index 404a06a..8bc3e90 100644
--- a/website/src/pages/index.js
+++ b/website/src/pages/index.js
@@ -20,7 +20,7 @@ function HomepageHeader() {
<div className={styles.buttons}>
<Link
className="button button--secondary button--lg"
- to="/releases/release-0.8.0">
+ to="/releases/release-0.9.0">
Latest Releases
</Link>
<Link