This is an automated email from the ASF dual-hosted git repository.
sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 6e5b07a [HUDI-1834] Moving releases 0.5.3 and 0.6.0 to archive (#4145)
6e5b07a is described below
commit 6e5b07a7f4fed4b305880e7eccc6b3e70284f078
Author: Sivabalan Narayanan <[email protected]>
AuthorDate: Sun Nov 28 23:02:14 2021 -0500
[HUDI-1834] Moving releases 0.5.3 and 0.6.0 to archive (#4145)
---
website/releases/download.md | 21 +-------
website/releases/older-releases.md | 98 +++++++++++++++++++++++++++++++++++++-
website/releases/release-0.5.3.md | 52 --------------------
website/releases/release-0.6.0.md | 58 ----------------------
4 files changed, 99 insertions(+), 130 deletions(-)
diff --git a/website/releases/download.md b/website/releases/download.md
index b90bdd5..4d46d07 100644
--- a/website/releases/download.md
+++ b/website/releases/download.md
@@ -18,25 +18,8 @@ last_modified_at: 2019-12-30T15:59:57-04:00
* Source Release : [Apache Hudi 0.7.0 Source
Release](https://www.apache.org/dyn/closer.lua/hudi/0.7.0/hudi-0.7.0.src.tgz)
([asc](https://downloads.apache.org/hudi/0.7.0/hudi-0.7.0.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.7.0/hudi-0.7.0.src.tgz.sha512))
* Release Note : ([Release Note for Apache Hudi
0.7.0](/releases/release-0.7.0))
-### Release 0.6.0
-* Source Release : [Apache Hudi 0.6.0 Source
Release](https://www.apache.org/dyn/closer.lua/hudi/0.6.0/hudi-0.6.0.src.tgz)
([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi
0.6.0](/releases/release-0.6.0))
-
-### Release 0.5.3
-* Source Release : [Apache Hudi 0.5.3 Source
Release](https://www.apache.org/dyn/closer.lua/hudi/0.5.3/hudi-0.5.3.src.tgz)
([asc](https://downloads.apache.org/hudi/0.5.3/hudi-0.5.3.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.5.3/hudi-0.5.3.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi
0.5.3](/releases/release-0.5.3))
-
-### Release 0.5.2-incubating
-* Source Release : [Apache Hudi 0.5.2-incubating Source
Release](https://www.apache.org/dyn/closer.lua/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
([asc](https://downloads.apache.org/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi
0.5.2](/releases/older-releases#release-052-incubating-docs))
-
-### Release 0.5.1-incubating
-* Source Release : [Apache Hudi 0.5.1-incubating Source
Release](https://www.apache.org/dyn/closer.lua/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz)
([asc](https://downloads.apache.org/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi
0.5.1](/releases/older-releases#release-051-incubating-docs))
-
-### Release 0.5.0-incubating
-* Source Release : [Apache Hudi 0.5.0-incubating Source
Release](https://www.apache.org/dyn/closer.lua/hudi/0.5.0-incubating/hudi-0.5.0-incubating.src.tgz)
([asc](https://downloads.apache.org/hudi/0.5.0-incubating/hudi-0.5.0-incubating.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.5.0-incubating/hudi-0.5.0-incubating.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi
0.5.0](/releases/older-releases#release-050-incubating-docs))
+### Older releases
+As new Hudi releases come out for each development stream, previous ones will
be archived, but they are still available at
[here](https://archive.apache.org/dist/hudi/).
## Verify Release
diff --git a/website/releases/older-releases.md
b/website/releases/older-releases.md
index 2468e33..f194c96 100644
--- a/website/releases/older-releases.md
+++ b/website/releases/older-releases.md
@@ -7,7 +7,103 @@ last_modified_at: 2020-05-28T08:40:00-07:00
---
This page contains older release information, for bookkeeping purposes. It's
recommended that you upgrade to one of the
-more recent releases listed [here](http://hudi.apache.org/releases)
+more recent releases listed [here](/releases/download)
+
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0)
([docs](/docs/quick-start-guide))
+
+## Migration Guide for this release
+- If migrating from release older than 0.5.3, please also check the upgrade
instructions for each subsequent release below.
+- With 0.6.0 Hudi is moving from list based rollback to marker based
rollbacks. To smoothly aid this transition a
+ new property called `hoodie.table.version` is added to `hoodie.properties`
file. Whenever Hudi is launched with
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade
step will be executed automatically.
+ This automatic upgrade step will happen just once per Hudi table as the
`hoodie.table.version` will be updated in property file after upgrade is
completed.
+- Similarly, a command line tool for Downgrading (command - `downgrade`) is
added if in case some users want to downgrade Hudi from table version 1 to 0 or
move from Hudi 0.6.0 to pre 0.6.0
+- If you were using a user defined partitioner with bulkInsert() RDD API, the
base interface has changed to `BulkInsertPartitioner` and will need minor
adjustments to your existing implementations.
+
+## Release Highlights
+
+### Writer side improvements:
+- Bootstrapping existing parquet datasets : Adds support for bootstrapping
existing datasets into Hudi, via both Spark datasource writer and
+ deltastreamer tool, with support for reading from Hive, SparkSQL, AWS Athena
(prestoDB support coming soon). See
[RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements)
for technical details.
+ Note that this is an experimental feature, which will be improved upon
further in the 0.6.x versions.
+- Native row writing for bulk_insert : Avoids any dataframe-rdd conversion for
bulk_insert path, which can improve performance of initial bulk loads.
+ Although, this is typically not the bottleneck for upsert/deletes,
subsequent releases in 0.6.x versions will expand this to other write operations
+ to make reasoning about schema management easier, avoiding the spark-avro
conversion totally.
+- Bulk insert sort modes : Hudi bulk_insert sorts the input globally to
optimize file sizes and avoid out-of-memory issues encountered when writing
parallely to multiple DFS partitions.
+ For users who want to prepare the dataframe for writing outside of Hudi, we
have made this configurable using `hoodie.bulkinsert.sort.mode`.
+- Cleaning can now be run concurrently with writing, using
`hoodie.clean.async=true`which can speed up time taken to finish committing.
+- Async compaction for spark streaming writes to hudi table, is now self
managed by default, controlling `hoodie.datasource.compaction.async.enable`.
+- Rollbacks no longer perform full table listings, by leveraging marker files.
To enable, set `hoodie.rollback.using.markers=true`.
+- Added a new index `hoodie.index.type=SIMPLE` which can be faster than
`BLOOM_INDEX` for cases where updates/deletes spread across a large portion of
the table.
+- Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent
Cloud Object Storage` storages.
+-
[HoodieMultiDeltaStreamer](https://hudi.apache.org/docs/writing_data#multitabledeltastreamer)
adds support for ingesting multiple kafka streams in a single DeltaStreamer
deployment, effectively reducing operational burden for using delta streamer
+ as your data lake ingestion tool (Experimental feature)
+- Added a new tool - InitialCheckPointProvider, to set checkpoints when
migrating to DeltaStreamer after an initial load of the table is complete.
+- Delta Streamer tool now supports ingesting CSV data sources, chaining of
multiple transformers to build more advanced ETL jobs.
+- Introducing a new `CustomKeyGenerator` key generator class, that provides
flexible configurations to provide enable different types of key, partition
path generation in single class.
+ We also added support for more time units and date/time formats in
`TimestampBasedKeyGenerator`. See
[docs](https://hudi.apache.org/docs/writing_data#key-generation) for more.
+
+### Query side improvements:
+- Starting 0.6.0, snapshot queries are feasible on MOR tables using spark
datasource. (experimental feature)
+- In prior versions we only supported `HoodieCombineHiveInputFormat` for
CopyOnWrite tables to ensure that there is a limit on the number of mappers
spawned for
+ any query. Hudi now supports Merge on Read tables also using
`HoodieCombineInputFormat`.
+- Speedup spark read queries by caching metaclient in HoodieROPathFilter. This
helps reduce listing related overheads in S3 when filtering files for
read-optimized queries.
+
+### Usability:
+- Spark DAGs are named to aid better debuggability.
+- Support pluggable metrics reporting by introducing proper abstraction for
user defined metrics. Console, JMX, Prometheus and DataDog metric reporters
have been added.
+- A new utility called Data snapshot exporter has been added. Latest table
snapshot as of a certain point in time can be exported as plain parquet files
with this tool.
+- Introduce write committed callback hooks for incremental pipelines to be
notified and act on new commits in the timeline. For e.g, Apache Airflow jobs
can be triggered
+ as new commits arrive.
+- Added support for deleting savepoints via CLI
+- Added a new command - `export instants`, to export metadata of instants
+
+## Raw Release Notes
+The raw release notes are available
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663)
+
+## [Release 0.5.3](https://github.com/apache/hudi/releases/tag/release-0.5.3)
([docs](/docs/quick-start-guide))
+
+## Migration Guide for this release
+* This is a bug fix only release and no special migration steps needed when
upgrading from 0.5.2. If you are upgrading from earlier releases “X”, please
make sure you read the migration guide for each subsequent release between “X”
and 0.5.3
+* 0.5.3 is the first hudi release after graduation. As a result, all hudi jars
will no longer have "-incubating" in the version name. In all the places where
hudi version is referred, please make sure "-incubating" is no longer present.
+
+For example hudi-spark-bundle pom dependency would look like:
+```xml
+ <dependency>
+ <groupId>org.apache.hudi</groupId>
+ <artifactId>hudi-spark-bundle_2.12</artifactId>
+ <version>0.5.3</version>
+ </dependency>
+```
+## Release Highlights
+* Hudi now supports `aliyun OSS` storage service.
+* Embedded Timeline Server is enabled by default for both delta-streamer and
spark datasource writes. This feature was in experimental mode before this
release. Embedded Timeline Server caches file listing calls in Spark driver and
serves them to Spark writer tasks. This reduces the number of file listings
needed to be performed for each write.
+* Incremental Cleaning is enabled by default for both delta-streamer and spark
datasource writes. This feature was also in experimental mode before this
release. In the steady state, incremental cleaning avoids the costly step of
scanning all partitions and instead uses Hudi metadata to find files to be
cleaned up.
+* Delta-streamer config files can now be placed in different filesystem than
actual data.
+* Hudi Hive Sync now supports tables partitioned by date type column.
+* Hudi Hive Sync now supports syncing directly via Hive MetaStore. You simply
need to set hoodie.datasource.hive_sync.use_jdbc
+ =false. Hive Metastore Uri will be read implicitly from environment. For
example, when writing through Spark Data Source,
+
+```Scala
+ spark.write.format(“hudi”)
+ .option(…)
+ .option(“hoodie.datasource.hive_sync.username”, “<user>”)
+ .option(“hoodie.datasource.hive_sync.password”, “<password>”)
+ .option(“hoodie.datasource.hive_sync.partition_fields”, “<partition_fields>”)
+ .option(“hoodie.datasource.hive_sync.database”, “<db_name>”)
+ .option(“hoodie.datasource.hive_sync.table”, “<table_name>”)
+ .option(“hoodie.datasource.hive_sync.use_jdbc”, “false”)
+ .mode(APPEND)
+ .save(“/path/to/dataset”)
+```
+
+* Other Writer Performance related fixes:
+ - DataSource Writer now avoids unnecessary loading of data after write.
+ - Hudi Writer now leverages spark parallelism when searching for existing
files for writing new records.
+
+## Raw Release Notes
+The raw release notes are available
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256)
+
## [Release
0.5.2-incubating](https://github.com/apache/hudi/releases/tag/release-0.5.2-incubating)
([docs](/docs/quick-start-guide))
diff --git a/website/releases/release-0.5.3.md
b/website/releases/release-0.5.3.md
deleted file mode 100644
index 936675b..0000000
--- a/website/releases/release-0.5.3.md
+++ /dev/null
@@ -1,52 +0,0 @@
----
-title: "Release 0.5.3"
-sidebar_position: 6
-layout: releases
-toc: true
-last_modified_at: 2020-05-28T08:40:00-07:00
----
-# [Release 0.5.3](https://github.com/apache/hudi/releases/tag/release-0.5.3)
([docs](/docs/quick-start-guide))
-
-## Migration Guide for this release
- * This is a bug fix only release and no special migration steps needed when
upgrading from 0.5.2. If you are upgrading from earlier releases “X”, please
make sure you read the migration guide for each subsequent release between “X”
and 0.5.3
- * 0.5.3 is the first hudi release after graduation. As a result, all hudi
jars will no longer have "-incubating" in the version name. In all the places
where hudi version is referred, please make sure "-incubating" is no longer
present.
-
-For example hudi-spark-bundle pom dependency would look like:
-```xml
- <dependency>
- <groupId>org.apache.hudi</groupId>
- <artifactId>hudi-spark-bundle_2.12</artifactId>
- <version>0.5.3</version>
- </dependency>
-```
-## Release Highlights
- * Hudi now supports `aliyun OSS` storage service.
- * Embedded Timeline Server is enabled by default for both delta-streamer and
spark datasource writes. This feature was in experimental mode before this
release. Embedded Timeline Server caches file listing calls in Spark driver and
serves them to Spark writer tasks. This reduces the number of file listings
needed to be performed for each write.
- * Incremental Cleaning is enabled by default for both delta-streamer and
spark datasource writes. This feature was also in experimental mode before this
release. In the steady state, incremental cleaning avoids the costly step of
scanning all partitions and instead uses Hudi metadata to find files to be
cleaned up.
- * Delta-streamer config files can now be placed in different filesystem than
actual data.
- * Hudi Hive Sync now supports tables partitioned by date type column.
- * Hudi Hive Sync now supports syncing directly via Hive MetaStore. You simply
need to set hoodie.datasource.hive_sync.use_jdbc
-=false. Hive Metastore Uri will be read implicitly from environment. For
example, when writing through Spark Data Source,
-
-```Scala
- spark.write.format(“hudi”)
- .option(…)
- .option(“hoodie.datasource.hive_sync.username”, “<user>”)
- .option(“hoodie.datasource.hive_sync.password”, “<password>”)
- .option(“hoodie.datasource.hive_sync.partition_fields”, “<partition_fields>”)
- .option(“hoodie.datasource.hive_sync.database”, “<db_name>”)
- .option(“hoodie.datasource.hive_sync.table”, “<table_name>”)
- .option(“hoodie.datasource.hive_sync.use_jdbc”, “false”)
- .mode(APPEND)
- .save(“/path/to/dataset”)
-```
-
- * Other Writer Performance related fixes:
- - DataSource Writer now avoids unnecessary loading of data after write.
- - Hudi Writer now leverages spark parallelism when searching for existing
files for writing new records.
-
-## Raw Release Notes
- The raw release notes are available
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256)
-
-
-For releases older than these versions, please see
[here](/releases/older-releases).
\ No newline at end of file
diff --git a/website/releases/release-0.6.0.md
b/website/releases/release-0.6.0.md
deleted file mode 100644
index f1b9581..0000000
--- a/website/releases/release-0.6.0.md
+++ /dev/null
@@ -1,58 +0,0 @@
----
-title: "Release 0.6.0"
-sidebar_position: 5
-layout: releases
-toc: true
-last_modified_at: 2020-05-28T08:40:00-07:00
----
-# [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0)
([docs](/docs/quick-start-guide))
-
-## Migration Guide for this release
- - If migrating from release older than 0.5.3, please also check the upgrade
instructions for each subsequent release below.
- - With 0.6.0 Hudi is moving from list based rollback to marker based
rollbacks. To smoothly aid this transition a
- new property called `hoodie.table.version` is added to `hoodie.properties`
file. Whenever Hudi is launched with
- newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade
step will be executed automatically.
- This automatic upgrade step will happen just once per Hudi table as the
`hoodie.table.version` will be updated in property file after upgrade is
completed.
- - Similarly, a command line tool for Downgrading (command - `downgrade`) is
added if in case some users want to downgrade Hudi from table version 1 to 0 or
move from Hudi 0.6.0 to pre 0.6.0
- - If you were using a user defined partitioner with bulkInsert() RDD API, the
base interface has changed to `BulkInsertPartitioner` and will need minor
adjustments to your existing implementations.
-
-## Release Highlights
-
-### Writer side improvements:
- - Bootstrapping existing parquet datasets : Adds support for bootstrapping
existing datasets into Hudi, via both Spark datasource writer and
- deltastreamer tool, with support for reading from Hive, SparkSQL, AWS
Athena (prestoDB support coming soon). See
[RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements)
for technical details.
- Note that this is an experimental feature, which will be improved upon
further in the 0.6.x versions.
- - Native row writing for bulk_insert : Avoids any dataframe-rdd conversion
for bulk_insert path, which can improve performance of initial bulk loads.
- Although, this is typically not the bottleneck for upsert/deletes,
subsequent releases in 0.6.x versions will expand this to other write operations
- to make reasoning about schema management easier, avoiding the
spark-avro conversion totally.
- - Bulk insert sort modes : Hudi bulk_insert sorts the input globally to
optimize file sizes and avoid out-of-memory issues encountered when writing
parallely to multiple DFS partitions.
- For users who want to prepare the dataframe for writing outside of Hudi,
we have made this configurable using `hoodie.bulkinsert.sort.mode`.
- - Cleaning can now be run concurrently with writing, using
`hoodie.clean.async=true`which can speed up time taken to finish committing.
- - Async compaction for spark streaming writes to hudi table, is now self
managed by default, controlling `hoodie.datasource.compaction.async.enable`.
- - Rollbacks no longer perform full table listings, by leveraging marker
files. To enable, set `hoodie.rollback.using.markers=true`.
- - Added a new index `hoodie.index.type=SIMPLE` which can be faster than
`BLOOM_INDEX` for cases where updates/deletes spread across a large portion of
the table.
- - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent
Cloud Object Storage` storages.
- -
[HoodieMultiDeltaStreamer](https://hudi.apache.org/docs/writing_data#multitabledeltastreamer)
adds support for ingesting multiple kafka streams in a single DeltaStreamer
deployment, effectively reducing operational burden for using delta streamer
- as your data lake ingestion tool (Experimental feature)
- - Added a new tool - InitialCheckPointProvider, to set checkpoints when
migrating to DeltaStreamer after an initial load of the table is complete.
- - Delta Streamer tool now supports ingesting CSV data sources, chaining of
multiple transformers to build more advanced ETL jobs.
- - Introducing a new `CustomKeyGenerator` key generator class, that provides
flexible configurations to provide enable different types of key, partition
path generation in single class.
- We also added support for more time units and date/time formats in
`TimestampBasedKeyGenerator`. See
[docs](https://hudi.apache.org/docs/writing_data#key-generation) for more.
-
-### Query side improvements:
- - Starting 0.6.0, snapshot queries are feasible on MOR tables using spark
datasource. (experimental feature)
- - In prior versions we only supported `HoodieCombineHiveInputFormat` for
CopyOnWrite tables to ensure that there is a limit on the number of mappers
spawned for
- any query. Hudi now supports Merge on Read tables also using
`HoodieCombineInputFormat`.
- - Speedup spark read queries by caching metaclient in HoodieROPathFilter.
This helps reduce listing related overheads in S3 when filtering files for
read-optimized queries.
-
-### Usability:
- - Spark DAGs are named to aid better debuggability.
- - Support pluggable metrics reporting by introducing proper abstraction for
user defined metrics. Console, JMX, Prometheus and DataDog metric reporters
have been added.
- - A new utility called Data snapshot exporter has been added. Latest table
snapshot as of a certain point in time can be exported as plain parquet files
with this tool.
- - Introduce write committed callback hooks for incremental pipelines to be
notified and act on new commits in the timeline. For e.g, Apache Airflow jobs
can be triggered
- as new commits arrive.
- - Added support for deleting savepoints via CLI
- - Added a new command - `export instants`, to export metadata of instants
-
-## Raw Release Notes
- The raw release notes are available
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663)