[hudi] branch asf-site updated: [HUDI-1834] Moving releases 0.5.3 and 0.6.0 to archive (#4145)

sivabalan Sun, 28 Nov 2021 20:02:40 -0800

This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 6e5b07a  [HUDI-1834] Moving releases 0.5.3 and 0.6.0 to archive (#4145)
6e5b07a is described below

commit 6e5b07a7f4fed4b305880e7eccc6b3e70284f078
Author: Sivabalan Narayanan <[email protected]>
AuthorDate: Sun Nov 28 23:02:14 2021 -0500

    [HUDI-1834] Moving releases 0.5.3 and 0.6.0 to archive (#4145)
---
 website/releases/download.md       | 21 +-------
 website/releases/older-releases.md | 98 +++++++++++++++++++++++++++++++++++++-
 website/releases/release-0.5.3.md  | 52 --------------------
 website/releases/release-0.6.0.md  | 58 ----------------------
 4 files changed, 99 insertions(+), 130 deletions(-)

diff --git a/website/releases/download.md b/website/releases/download.md
index b90bdd5..4d46d07 100644
--- a/website/releases/download.md
+++ b/website/releases/download.md
@@ -18,25 +18,8 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 * Source Release : [Apache Hudi 0.7.0 Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.7.0/hudi-0.7.0.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.7.0/hudi-0.7.0.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.7.0/hudi-0.7.0.src.tgz.sha512))
 * Release Note : ([Release Note for Apache Hudi 
0.7.0](/releases/release-0.7.0))
 
-### Release 0.6.0
-* Source Release : [Apache Hudi 0.6.0 Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.6.0/hudi-0.6.0.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi 
0.6.0](/releases/release-0.6.0))
-
-### Release 0.5.3
-* Source Release : [Apache Hudi 0.5.3 Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.5.3/hudi-0.5.3.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.5.3/hudi-0.5.3.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.5.3/hudi-0.5.3.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi 
0.5.3](/releases/release-0.5.3))
-
-### Release 0.5.2-incubating
-* Source Release : [Apache Hudi 0.5.2-incubating Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz)
 
([asc](https://downloads.apache.org/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc),
 
[sha512](https://downloads.apache.org/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi 
0.5.2](/releases/older-releases#release-052-incubating-docs))
-
-### Release 0.5.1-incubating
-* Source Release : [Apache Hudi 0.5.1-incubating Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz)
 
([asc](https://downloads.apache.org/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc),
 
[sha512](https://downloads.apache.org/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi 
0.5.1](/releases/older-releases#release-051-incubating-docs))
-
-### Release 0.5.0-incubating
-* Source Release : [Apache Hudi 0.5.0-incubating Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/0.5.0-incubating/hudi-0.5.0-incubating.src.tgz)
 
([asc](https://downloads.apache.org/hudi/0.5.0-incubating/hudi-0.5.0-incubating.src.tgz.asc),
 
[sha512](https://downloads.apache.org/hudi/0.5.0-incubating/hudi-0.5.0-incubating.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi 
0.5.0](/releases/older-releases#release-050-incubating-docs))
+### Older releases
+As new Hudi releases come out for each development stream, previous ones will 
be archived, but they are still available at 
[here](https://archive.apache.org/dist/hudi/). 
 
 ## Verify Release
 
diff --git a/website/releases/older-releases.md 
b/website/releases/older-releases.md
index 2468e33..f194c96 100644
--- a/website/releases/older-releases.md
+++ b/website/releases/older-releases.md
@@ -7,7 +7,103 @@ last_modified_at: 2020-05-28T08:40:00-07:00
 ---
 
 This page contains older release information, for bookkeeping purposes. It's 
recommended that you upgrade to one of the 
-more recent releases listed [here](http://hudi.apache.org/releases)
+more recent releases listed [here](/releases/download)
+
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) 
([docs](/docs/quick-start-guide))
+
+## Migration Guide for this release
+- If migrating from release older than 0.5.3, please also check the upgrade 
instructions for each subsequent release below.
+- With 0.6.0 Hudi is moving from list based rollback to marker based 
rollbacks. To smoothly aid this transition a
+  new property called `hoodie.table.version` is added to `hoodie.properties` 
file. Whenever Hudi is launched with
+  newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade 
step will be executed automatically.
+  This automatic upgrade step will happen just once per Hudi table as the 
`hoodie.table.version` will be updated in property file after upgrade is 
completed.
+- Similarly, a command line tool for Downgrading (command - `downgrade`) is 
added if in case some users want to downgrade Hudi from table version 1 to 0 or 
move from Hudi 0.6.0 to pre 0.6.0
+- If you were using a user defined partitioner with bulkInsert() RDD API, the 
base interface has changed to `BulkInsertPartitioner` and will need minor 
adjustments to your existing implementations.
+
+## Release Highlights
+
+### Writer side improvements:
+- Bootstrapping existing parquet datasets :  Adds support for bootstrapping 
existing datasets into Hudi, via both Spark datasource writer and
+  deltastreamer tool, with support for reading from Hive, SparkSQL, AWS Athena 
(prestoDB support coming soon). See 
[RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements)
 for technical details.
+  Note that this is an experimental feature, which will be improved upon 
further in the 0.6.x versions.
+- Native row writing for bulk_insert : Avoids any dataframe-rdd conversion for 
bulk_insert path, which can improve performance of initial bulk loads.
+  Although, this is typically not the bottleneck for upsert/deletes, 
subsequent releases in 0.6.x versions will expand this to other write operations
+  to make reasoning about schema management easier, avoiding the spark-avro 
conversion totally.
+- Bulk insert sort modes : Hudi bulk_insert sorts the input globally to 
optimize file sizes and avoid out-of-memory issues encountered when writing 
parallely to multiple DFS partitions.
+  For users who want to prepare the dataframe for writing outside of Hudi, we 
have made this configurable using `hoodie.bulkinsert.sort.mode`.
+- Cleaning can now be run concurrently with writing, using 
`hoodie.clean.async=true`which can speed up time taken to finish committing.
+- Async compaction for spark streaming writes to hudi table, is now self 
managed by default, controlling `hoodie.datasource.compaction.async.enable`.
+- Rollbacks no longer perform full table listings, by leveraging marker files. 
To enable, set `hoodie.rollback.using.markers=true`.
+- Added a new index `hoodie.index.type=SIMPLE` which can be faster than 
`BLOOM_INDEX` for cases where updates/deletes spread across a large portion of 
the table.
+- Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent 
Cloud Object Storage` storages.
+- 
[HoodieMultiDeltaStreamer](https://hudi.apache.org/docs/writing_data#multitabledeltastreamer)
 adds support for ingesting multiple kafka streams in a single DeltaStreamer 
deployment, effectively reducing operational burden for using delta streamer
+  as your data lake ingestion tool (Experimental feature)
+- Added a new tool - InitialCheckPointProvider, to set checkpoints when 
migrating to DeltaStreamer after an initial load of the table is complete.
+- Delta Streamer tool now supports ingesting CSV data sources, chaining of 
multiple transformers to build more advanced ETL jobs.
+- Introducing a new `CustomKeyGenerator` key generator class, that provides 
flexible configurations to provide enable different types of key, partition 
path generation in  single class.
+  We also added support for more time units and date/time formats in 
`TimestampBasedKeyGenerator`. See 
[docs](https://hudi.apache.org/docs/writing_data#key-generation) for more.
+
+### Query side improvements:
+- Starting 0.6.0, snapshot queries are feasible on MOR tables using spark 
datasource. (experimental feature)
+- In prior versions we only supported `HoodieCombineHiveInputFormat` for 
CopyOnWrite tables to ensure that there is a limit on the number of mappers 
spawned for
+  any query. Hudi now supports Merge on Read tables also using 
`HoodieCombineInputFormat`.
+- Speedup spark read queries by caching metaclient in HoodieROPathFilter. This 
helps reduce listing related overheads in S3 when filtering files for 
read-optimized queries.
+
+### Usability:
+- Spark DAGs are named to aid better debuggability.
+- Support pluggable metrics reporting by introducing proper abstraction for 
user defined metrics. Console, JMX, Prometheus and DataDog metric reporters 
have been added.
+- A new utility called Data snapshot exporter has been added. Latest table 
snapshot as of a certain point in time can be exported as plain parquet files 
with this tool.
+- Introduce write committed callback hooks for incremental pipelines to be 
notified and act on new commits in the timeline. For e.g, Apache Airflow jobs 
can be triggered
+  as new commits arrive.
+- Added support for deleting savepoints via CLI
+- Added a new command - `export instants`, to export metadata of instants
+
+## Raw Release Notes
+The raw release notes are available 
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663)
+
+## [Release 0.5.3](https://github.com/apache/hudi/releases/tag/release-0.5.3) 
([docs](/docs/quick-start-guide))
+
+## Migration Guide for this release
+* This is a bug fix only release and no special migration steps needed when 
upgrading from 0.5.2. If you are upgrading from earlier releases “X”, please 
make sure you read the migration guide for each subsequent release between “X” 
and 0.5.3
+* 0.5.3 is the first hudi release after graduation. As a result, all hudi jars 
will no longer have "-incubating" in the version name. In all the places where 
hudi version is referred, please make sure "-incubating" is no longer present.
+
+For example hudi-spark-bundle pom dependency would look like:
+```xml
+    <dependency>
+        <groupId>org.apache.hudi</groupId>
+        <artifactId>hudi-spark-bundle_2.12</artifactId>
+        <version>0.5.3</version>
+    </dependency>
+```
+## Release Highlights
+* Hudi now supports `aliyun OSS` storage service.
+* Embedded Timeline Server is enabled by default for both delta-streamer and 
spark datasource writes. This feature was in experimental mode before this 
release. Embedded Timeline Server caches file listing calls in Spark driver and 
serves them to Spark writer tasks. This reduces the number of file listings 
needed to be performed for each write.
+* Incremental Cleaning is enabled by default for both delta-streamer and spark 
datasource writes. This feature was also in experimental mode before this 
release. In the steady state, incremental cleaning avoids the costly step of 
scanning all partitions and instead uses Hudi metadata to find files to be 
cleaned up.
+* Delta-streamer config files can now be placed in different filesystem than 
actual data.
+* Hudi Hive Sync now supports tables partitioned by date type column.
+* Hudi Hive Sync now supports syncing directly via Hive MetaStore. You simply 
need to set hoodie.datasource.hive_sync.use_jdbc
+  =false. Hive Metastore Uri will be read implicitly from environment. For 
example, when writing through Spark Data Source,
+
+```Scala
+ spark.write.format(“hudi”)
+ .option(…)
+ .option(“hoodie.datasource.hive_sync.username”, “<user>”)
+ .option(“hoodie.datasource.hive_sync.password”, “<password>”)
+ .option(“hoodie.datasource.hive_sync.partition_fields”, “<partition_fields>”)
+ .option(“hoodie.datasource.hive_sync.database”, “<db_name>”)
+ .option(“hoodie.datasource.hive_sync.table”, “<table_name>”)
+ .option(“hoodie.datasource.hive_sync.use_jdbc”, “false”)
+ .mode(APPEND)
+ .save(“/path/to/dataset”)
+```
+
+* Other Writer Performance related fixes:
+  - DataSource Writer now avoids unnecessary loading of data after write.
+  - Hudi Writer now leverages spark parallelism when searching for existing 
files for writing new records.
+
+## Raw Release Notes
+The raw release notes are available 
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256)
+
 
 ## [Release 
0.5.2-incubating](https://github.com/apache/hudi/releases/tag/release-0.5.2-incubating)
 ([docs](/docs/quick-start-guide))
 
diff --git a/website/releases/release-0.5.3.md 
b/website/releases/release-0.5.3.md
deleted file mode 100644
index 936675b..0000000
--- a/website/releases/release-0.5.3.md
+++ /dev/null
@@ -1,52 +0,0 @@
----
-title: "Release 0.5.3"
-sidebar_position: 6
-layout: releases
-toc: true
-last_modified_at: 2020-05-28T08:40:00-07:00
----
-# [Release 0.5.3](https://github.com/apache/hudi/releases/tag/release-0.5.3) 
([docs](/docs/quick-start-guide))
-
-## Migration Guide for this release
- * This is a bug fix only release and no special migration steps needed when 
upgrading from 0.5.2. If you are upgrading from earlier releases “X”, please 
make sure you read the migration guide for each subsequent release between “X” 
and 0.5.3
- * 0.5.3 is the first hudi release after graduation. As a result, all hudi 
jars will no longer have "-incubating" in the version name. In all the places 
where hudi version is referred, please make sure "-incubating" is no longer 
present.
-
-For example hudi-spark-bundle pom dependency would look like:
-```xml
-    <dependency>
-        <groupId>org.apache.hudi</groupId>
-        <artifactId>hudi-spark-bundle_2.12</artifactId>
-        <version>0.5.3</version>
-    </dependency>
-```
-## Release Highlights
- * Hudi now supports `aliyun OSS` storage service.
- * Embedded Timeline Server is enabled by default for both delta-streamer and 
spark datasource writes. This feature was in experimental mode before this 
release. Embedded Timeline Server caches file listing calls in Spark driver and 
serves them to Spark writer tasks. This reduces the number of file listings 
needed to be performed for each write.
- * Incremental Cleaning is enabled by default for both delta-streamer and 
spark datasource writes. This feature was also in experimental mode before this 
release. In the steady state, incremental cleaning avoids the costly step of 
scanning all partitions and instead uses Hudi metadata to find files to be 
cleaned up.
- * Delta-streamer config files can now be placed in different filesystem than 
actual data.
- * Hudi Hive Sync now supports tables partitioned by date type column.
- * Hudi Hive Sync now supports syncing directly via Hive MetaStore. You simply 
need to set hoodie.datasource.hive_sync.use_jdbc
-=false. Hive Metastore Uri will be read implicitly from environment. For 
example, when writing through Spark Data Source,    
-
-```Scala
- spark.write.format(“hudi”)
- .option(…)
- .option(“hoodie.datasource.hive_sync.username”, “<user>”)
- .option(“hoodie.datasource.hive_sync.password”, “<password>”)
- .option(“hoodie.datasource.hive_sync.partition_fields”, “<partition_fields>”)
- .option(“hoodie.datasource.hive_sync.database”, “<db_name>”)
- .option(“hoodie.datasource.hive_sync.table”, “<table_name>”)
- .option(“hoodie.datasource.hive_sync.use_jdbc”, “false”)
- .mode(APPEND)
- .save(“/path/to/dataset”)
-```
-
- * Other Writer Performance related fixes:
-   - DataSource Writer now avoids unnecessary loading of data after write.
-   - Hudi Writer now leverages spark parallelism when searching for existing 
files for writing new records.
- 
-## Raw Release Notes
-   The raw release notes are available 
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256)
-
-
-For releases older than these versions, please see 
[here](/releases/older-releases).
\ No newline at end of file
diff --git a/website/releases/release-0.6.0.md 
b/website/releases/release-0.6.0.md
deleted file mode 100644
index f1b9581..0000000
--- a/website/releases/release-0.6.0.md
+++ /dev/null
@@ -1,58 +0,0 @@
----
-title: "Release 0.6.0"
-sidebar_position: 5
-layout: releases
-toc: true
-last_modified_at: 2020-05-28T08:40:00-07:00
----
-# [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) 
([docs](/docs/quick-start-guide))
-
-## Migration Guide for this release
- - If migrating from release older than 0.5.3, please also check the upgrade 
instructions for each subsequent release below.
- - With 0.6.0 Hudi is moving from list based rollback to marker based 
rollbacks. To smoothly aid this transition a 
- new property called `hoodie.table.version` is added to `hoodie.properties` 
file. Whenever Hudi is launched with 
- newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade 
step will be executed automatically. 
- This automatic upgrade step will happen just once per Hudi table as the 
`hoodie.table.version` will be updated in property file after upgrade is 
completed.
- - Similarly, a command line tool for Downgrading (command - `downgrade`) is 
added if in case some users want to downgrade Hudi from table version 1 to 0 or 
move from Hudi 0.6.0 to pre 0.6.0
- - If you were using a user defined partitioner with bulkInsert() RDD API, the 
base interface has changed to `BulkInsertPartitioner` and will need minor 
adjustments to your existing implementations.
- 
-## Release Highlights
-
-### Writer side improvements:
-  - Bootstrapping existing parquet datasets :  Adds support for bootstrapping 
existing datasets into Hudi, via both Spark datasource writer and 
-     deltastreamer tool, with support for reading from Hive, SparkSQL, AWS 
Athena (prestoDB support coming soon). See 
[RFC-15](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+and+Query+Planning+Improvements)
 for technical details. 
-     Note that this is an experimental feature, which will be improved upon 
further in the 0.6.x versions.
-  - Native row writing for bulk_insert : Avoids any dataframe-rdd conversion 
for bulk_insert path, which can improve performance of initial bulk loads.
-      Although, this is typically not the bottleneck for upsert/deletes, 
subsequent releases in 0.6.x versions will expand this to other write operations
-      to make reasoning about schema management easier, avoiding the 
spark-avro conversion totally.
-  - Bulk insert sort modes : Hudi bulk_insert sorts the input globally to 
optimize file sizes and avoid out-of-memory issues encountered when writing 
parallely to multiple DFS partitions. 
-     For users who want to prepare the dataframe for writing outside of Hudi, 
we have made this configurable using `hoodie.bulkinsert.sort.mode`.
-  - Cleaning can now be run concurrently with writing, using 
`hoodie.clean.async=true`which can speed up time taken to finish committing.
-  - Async compaction for spark streaming writes to hudi table, is now self 
managed by default, controlling `hoodie.datasource.compaction.async.enable`.
-  - Rollbacks no longer perform full table listings, by leveraging marker 
files. To enable, set `hoodie.rollback.using.markers=true`.
-  - Added a new index `hoodie.index.type=SIMPLE` which can be faster than 
`BLOOM_INDEX` for cases where updates/deletes spread across a large portion of 
the table.   
-  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent 
Cloud Object Storage` storages.
-  - 
[HoodieMultiDeltaStreamer](https://hudi.apache.org/docs/writing_data#multitabledeltastreamer)
 adds support for ingesting multiple kafka streams in a single DeltaStreamer 
deployment, effectively reducing operational burden for using delta streamer 
-    as your data lake ingestion tool (Experimental feature)
-  - Added a new tool - InitialCheckPointProvider, to set checkpoints when 
migrating to DeltaStreamer after an initial load of the table is complete.
-  - Delta Streamer tool now supports ingesting CSV data sources, chaining of 
multiple transformers to build more advanced ETL jobs.
-  - Introducing a new `CustomKeyGenerator` key generator class, that provides 
flexible configurations to provide enable different types of key, partition 
path generation in  single class.
-    We also added support for more time units and date/time formats in 
`TimestampBasedKeyGenerator`. See 
[docs](https://hudi.apache.org/docs/writing_data#key-generation) for more.
-
-### Query side improvements:
-  - Starting 0.6.0, snapshot queries are feasible on MOR tables using spark 
datasource. (experimental feature)
-  - In prior versions we only supported `HoodieCombineHiveInputFormat` for 
CopyOnWrite tables to ensure that there is a limit on the number of mappers 
spawned for
-    any query. Hudi now supports Merge on Read tables also using 
`HoodieCombineInputFormat`.
-  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. 
This helps reduce listing related overheads in S3 when filtering files for 
read-optimized queries. 
-
-### Usability:
-  - Spark DAGs are named to aid better debuggability.
-  - Support pluggable metrics reporting by introducing proper abstraction for 
user defined metrics. Console, JMX, Prometheus and DataDog metric reporters 
have been added.
-  - A new utility called Data snapshot exporter has been added. Latest table 
snapshot as of a certain point in time can be exported as plain parquet files 
with this tool.
-  - Introduce write committed callback hooks for incremental pipelines to be 
notified and act on new commits in the timeline. For e.g, Apache Airflow jobs 
can be triggered
-    as new commits arrive.
-  - Added support for deleting savepoints via CLI
-  - Added a new command - `export instants`, to export metadata of instants
-
-## Raw Release Notes
-   The raw release notes are available 
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663)

[hudi] branch asf-site updated: [HUDI-1834] Moving releases 0.5.3 and 0.6.0 to archive (#4145)

Reply via email to