This is an automated email from the ASF dual-hosted git repository.
codope pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7ec281297ca [DOCS] Release notes 1.0.0-beta2 (#11618)
7ec281297ca is described below
commit 7ec281297ca26f647ffa6217a13d2b309128ae09
Author: Sagar Sumit <[email protected]>
AuthorDate: Tue Jul 16 12:14:28 2024 +0530
[DOCS] Release notes 1.0.0-beta2 (#11618)
* [DOCS] Release notes for 1.0.0-beta2
* add sql with limitations
* Fix build
* Update sidebars and some more items in release notes
* Fix sidebars, links and address other comments
---
website/docs/metadata.md | 26 ++++++++++
website/docs/sql_ddl.md | 88 ++++++++++++++++++++++++++++++++-
website/docs/sql_dml.md | 24 +++++++++
website/releases/download.md | 6 ++-
website/releases/older-releases.md | 2 +-
website/releases/release-0.10.0.md | 2 +-
website/releases/release-0.10.1.md | 2 +-
website/releases/release-0.11.0.md | 2 +-
website/releases/release-0.11.1.md | 2 +-
website/releases/release-0.12.0.md | 2 +-
website/releases/release-0.12.1.md | 2 +-
website/releases/release-0.12.2.md | 2 +-
website/releases/release-0.12.3.md | 2 +-
website/releases/release-0.13.0.md | 2 +-
website/releases/release-0.13.1.md | 2 +-
website/releases/release-1.0.0-beta2.md | 85 +++++++++++++++++++++++++++++++
16 files changed, 238 insertions(+), 13 deletions(-)
diff --git a/website/docs/metadata.md b/website/docs/metadata.md
index 413114f13a5..68c5aaa9f8e 100644
--- a/website/docs/metadata.md
+++ b/website/docs/metadata.md
@@ -90,6 +90,32 @@ Following are the different indices currently available
under the metadata table
Hudi release, this index aids in locating records faster than other existing
indices and can provide a speedup orders of magnitude
faster in large deployments where index lookup dominates write latencies.
+#### New Indexes in 1.0.0
+
+- ***Functional Index***:
+ A [functional
index](https://github.com/apache/hudi/blob/3789840be3d041cbcfc6b24786740210e4e6d6ac/rfc/rfc-63/rfc-63.md)
+ is an index on a function of a column. If a query has a predicate on a
function of a column, the functional index can
+ be used to speed up the query. Functional index is stored in *func_index_*
prefixed partitions (one for each
+ function) under metadata table. Functional index can be created using SQL
syntax. Please checkout SQL DDL
+ docs [here](/docs/next/sql_ddl#create-functional-index-experimental) for
more details.
+
+- ***Partition Stats Index***
+ Partition stats index aggregates statistics at the partition level for the
columns for which it is enabled. This helps
+ in efficient partition pruning even for non-partition fields. The partition
stats index is stored in *partition_stats*
+ partition under metadata table. Partition stats index can be enabled using
the following configs (note it is required
+ to specify the columns for which stats should be aggregated):
+ ```properties
+ hoodie.metadata.index.partition.stats.enable=true
+ hoodie.metadata.index.column.stats.columns=<comma-separated-column-names>
+ ```
+
+- ***Secondary Index***:
+ Secondary indexes allow users to create indexes on columns that are not part
of record key columns in Hudi tables (for
+ record key fields, Hudi supports [Record-level
Index](/blog/2023/11/01/record-level-index). Secondary indexes
+ can be used to speed up queries with predicate on columns other than record
key columns.
+
+To try out these features, refer to the [SQL
guide](/docs/next/sql_ddl#create-partition-stats-and-secondary-index-experimental).
+
## Enable Hudi Metadata Table and Multi-Modal Index in write side
Following are the Spark based basic configs that are needed to enable metadata
and multi-modal indices. For advanced configs please refer
diff --git a/website/docs/sql_ddl.md b/website/docs/sql_ddl.md
index eebadfc580e..08d8380afaf 100644
--- a/website/docs/sql_ddl.md
+++ b/website/docs/sql_ddl.md
@@ -217,7 +217,13 @@ DROP INDEX [IF EXISTS] index_name ON [TABLE] table_name
- Both index and column on which the index is created can be qualified with
some options in the form of key-value pairs.
We will see this with an example of functional index below.
-#### Create Functional Index
+:::note
+Except for the `files`, `column_stats`, `bloom_filters` and `record_index`,
all other indexes are experimental. We
+encourage users to try out these features on new tables and provide feedback.
Below, we have also listed current
+limitations of these indexes.
+:::
+
+#### Create Functional Index (Experimental)
A [functional
index](https://github.com/apache/hudi/blob/00ece7bce0a4a8d0019721a28049723821e01842/rfc/rfc-63/rfc-63.md)
is an index on a function of a column. It is a new addition to Hudi's
[multi-modal
indexing](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
@@ -328,6 +334,86 @@ Project [city#2970, fare#2969, rider#2967, driver#2968],
Statistics(sizeInBytes=
```
</details>
+#### Create Partition Stats and Secondary Index (Experimental)
+
+Hudi supports various [indexes](/docs/next/metadata#metadata-table-indices).
Let us see how we can use them in the following example.
+
+```sql
+DROP TABLE IF EXISTS hudi_table;
+-- Let us create a table with multiple partition fields, and enable record
index and partition stats index
+CREATE TABLE hudi_table (
+ ts BIGINT,
+ id STRING,
+ rider STRING,
+ driver STRING,
+ fare DOUBLE,
+ city STRING,
+ state STRING
+) USING hudi
+ OPTIONS(
+ primaryKey ='id',
+ hoodie.metadata.record.index.enable = 'true', -- enable record index
+ hoodie.metadata.index.partition.stats.enable = 'true', -- enable partition
stats index
+ hoodie.metadata.index.column.stats.column.list = 'rider' -- create
partition stats index on rider column
+)
+PARTITIONED BY (city, state)
+LOCATION 'file:///tmp/hudi_test_table';
+
+INSERT INTO hudi_table VALUES
(1695159649,'trip1','rider-A','driver-K',19.10,'san_francisco','california');
+INSERT INTO hudi_table VALUES
(1695091554,'trip2','rider-C','driver-M',27.70,'sunnyvale','california');
+INSERT INTO hudi_table VALUES
(1695332066,'trip3','rider-E','driver-O',93.50,'austin','texas');
+INSERT INTO hudi_table VALUES
(1695516137,'trip4','rider-F','driver-P',34.15,'houston','texas');
+
+-- Enable data skipping for the reader
+set hoodie.metadata.enable=true;
+set hoodie.enable.data.skipping=true;
+
+-- simple partition predicate --
+select * from hudi_table where city = 'sunnyvale';
+20240710215107477 20240710215107477_0_0 trip2
city=sunnyvale/state=california
1dcb14a9-bc4a-4eac-aab5-015f2254b7ec-0_0-40-75_20240710215107477.parquet
1695091554 trip2 rider-C driver-M 27.7 sunnyvale
california
+Time taken: 0.58 seconds, Fetched 1 row(s)
+
+-- simple partition predicate on other partition field --
+select * from hudi_table where state = 'texas';
+20240710215119846 20240710215119846_0_0 trip4
city=houston/state=texas
08c6ed2c-a87b-4798-8f70-6d8b16cb1932-0_0-74-133_20240710215119846.parquet
1695516137 trip4 rider-F driver-P 34.15 houston texas
+20240710215110584 20240710215110584_0_0 trip3 city=austin/state=texas
0ab2243c-cc08-4da3-8302-4ce0b4c47a08-0_0-57-104_20240710215110584.parquet
1695332066 trip3 rider-E driver-O 93.5 austin texas
+Time taken: 0.124 seconds, Fetched 2 row(s)
+
+-- predicate on a column for which partition stats are present --
+select id, rider, city, state from hudi_table where rider > 'rider-D';
+trip4 rider-F houston texas
+trip3 rider-E austin texas
+Time taken: 0.703 seconds, Fetched 2 row(s)
+
+-- record key predicate --
+SELECT id, rider, driver FROM hudi_table WHERE id = 'trip1';
+trip1 rider-A driver-K
+Time taken: 0.368 seconds, Fetched 1 row(s)
+
+-- create secondary index on driver --
+CREATE INDEX driver_idx ON hudi_table USING secondary_index(driver);
+
+-- secondary key predicate --
+SELECT id, driver, city, state FROM hudi_table WHERE driver IN ('driver-K',
'driver-M');
+trip1 driver-K san_francisco california
+trip2 driver-M sunnyvale california
+Time taken: 0.83 seconds, Fetched 2 row(s)
+```
+
+**Limitations of using these indexes:**
+
+- Unlike column stats, partition stats index is not created automatically for
all columns. Users must specify list of
+ columns for which they want to create partition stats index.
+- Predicate on internal meta fields such as `_hoodie_record_key` or
`_hoodie_partition_path` cannot be used for data
+ skipping. Queries with such predicates cannot leverage the indexes.
+- Secondary index is not supported for nested fields.
+- Index update can fail with schema evolution.
+- If there are multiple indexes present, then secondary index and functional
index update can fail.
+- Only one index can be created at a time using [async
indexer](/docs/next/metadata_indexing).
+- Ensure native HFile reader is disabled (`_hoodie.hfile.use.native.reader`)
to leverage the secondary index. Default value for this config is `false`.
+
+Limitations will be addressed before 1.0.0 is made generally available.
+
### Setting Hudi configs
There are different ways you can pass the configs for a given hudi table.
diff --git a/website/docs/sql_dml.md b/website/docs/sql_dml.md
index edb63730b13..04590765f3f 100644
--- a/website/docs/sql_dml.md
+++ b/website/docs/sql_dml.md
@@ -266,6 +266,30 @@ DELETE FROM hudi_table WHERE price < 100;
Delete query only work with batch excution mode.
:::
+### Lookup Joins
+
+A lookup join is typically used to enrich a table with data that is queried
from an external system. The join requires
+one table to have a processing time attribute and the other table to be backed
by a lookup source connector.
+
+```sql
+CREATE TABLE datagen_source(
+ id int,
+ name STRING,
+ proctime as PROCTIME()
+) WITH (
+'connector' = 'datagen',
+'rows-per-second'='1',
+'number-of-rows' = '2',
+'fields.id.kind'='sequence',
+'fields.id.start'='1',
+'fields.id.end'='2'
+);
+
+SELECT o.id,o.name,b.id as id2
+FROM datagen_source AS o
+JOIN hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */ FOR
SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id;
+```
+
### Setting Writer/Reader Configs
With Flink SQL, you can additionally set the writer/reader writer configs
along with the query.
diff --git a/website/releases/download.md b/website/releases/download.md
index 023e1557825..148be08abc2 100644
--- a/website/releases/download.md
+++ b/website/releases/download.md
@@ -6,6 +6,10 @@ toc: true
last_modified_at: 2022-12-27T15:59:57-04:00
---
+### Release 1.0.0-beta2
+* Source Release : [Apache Hudi 1.0.0-beta2 Source
Release](https://downloads.apache.org/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz)
([asc](https://downloads.apache.org/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.sha512))
+* Release Note : ([Release Note for Apache Hudi
1.0.0-beta2](/releases/release-1.0.0-beta2))
+
### Release 0.15.0
* Source Release : [Apache Hudi 0.15.0 Source
Release](https://downloads.apache.org/hudi/0.15.0/hudi-0.15.0.src.tgz)
([asc](https://downloads.apache.org/hudi/0.15.0/hudi-0.15.0.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/0.15.0/hudi-0.15.0.src.tgz.sha512))
* Release Note : ([Release Note for Apache Hudi
0.15.0](/releases/release-0.15.0))
@@ -16,7 +20,7 @@ last_modified_at: 2022-12-27T15:59:57-04:00
### Release 1.0.0-beta1
* Source Release : [Apache Hudi 1.0.0-beta1 Source
Release](https://www.apache.org/dyn/closer.lua/hudi/1.0.0-beta1/hudi-1.0.0-beta1.src.tgz)
([asc](https://downloads.apache.org/hudi/1.0.0-beta1/hudi-1.0.0-beta1.src.tgz.asc),
[sha512](https://downloads.apache.org/hudi/1.0.0-beta1/hudi-1.0.0-beta1.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi
0.14.0](/releases/release-1.0.0-beta1))
+* Release Note : ([Release Note for Apache Hudi
1.0.0-beta1](/releases/release-1.0.0-beta1))
### Release 0.12.3
[Long Term Support](/releases/release-0.12.3#long-term-support): this is the
latest stable release
diff --git a/website/releases/older-releases.md
b/website/releases/older-releases.md
index 4d9e75005f3..ea044b31efd 100644
--- a/website/releases/older-releases.md
+++ b/website/releases/older-releases.md
@@ -1,6 +1,6 @@
---
title: "Older Releases"
-sidebar_position: 19
+sidebar_position: 20
layout: releases
toc: true
last_modified_at: 2020-05-28T08:40:00-07:00
diff --git a/website/releases/release-0.10.0.md
b/website/releases/release-0.10.0.md
index 9ca15db71f1..6f5551a0d01 100644
--- a/website/releases/release-0.10.0.md
+++ b/website/releases/release-0.10.0.md
@@ -1,6 +1,6 @@
---
title: "Release 0.10.0"
-sidebar_position: 14
+sidebar_position: 15
layout: releases
toc: true
---
diff --git a/website/releases/release-0.10.1.md
b/website/releases/release-0.10.1.md
index 5856f0eb100..42ec76ee1f6 100644
--- a/website/releases/release-0.10.1.md
+++ b/website/releases/release-0.10.1.md
@@ -1,6 +1,6 @@
---
title: "Release 0.10.1"
-sidebar_position: 13
+sidebar_position: 14
layout: releases
toc: true
---
diff --git a/website/releases/release-0.11.0.md
b/website/releases/release-0.11.0.md
index d0cdef30951..fbea4897b45 100644
--- a/website/releases/release-0.11.0.md
+++ b/website/releases/release-0.11.0.md
@@ -1,6 +1,6 @@
---
title: "Release 0.11.0"
-sidebar_position: 12
+sidebar_position: 13
layout: releases
toc: true
last_modified_at: 2022-01-27T22:07:00+08:00
diff --git a/website/releases/release-0.11.1.md
b/website/releases/release-0.11.1.md
index 5aa5d89e11b..6f727ddccd2 100644
--- a/website/releases/release-0.11.1.md
+++ b/website/releases/release-0.11.1.md
@@ -1,6 +1,6 @@
---
title: "Release 0.11.1"
-sidebar_position: 11
+sidebar_position: 12
layout: releases
toc: true
last_modified_at: 2022-06-19T23:30:00-07:00
diff --git a/website/releases/release-0.12.0.md
b/website/releases/release-0.12.0.md
index 78b27997962..93be2c17e55 100644
--- a/website/releases/release-0.12.0.md
+++ b/website/releases/release-0.12.0.md
@@ -1,6 +1,6 @@
---
title: "Release 0.12.0"
-sidebar_position: 10
+sidebar_position: 11
layout: releases
toc: true
---
diff --git a/website/releases/release-0.12.1.md
b/website/releases/release-0.12.1.md
index b4f8d643c7d..8d1f002a79b 100644
--- a/website/releases/release-0.12.1.md
+++ b/website/releases/release-0.12.1.md
@@ -1,6 +1,6 @@
---
title: "Release 0.12.1"
-sidebar_position: 9
+sidebar_position: 10
layout: releases
toc: true
---
diff --git a/website/releases/release-0.12.2.md
b/website/releases/release-0.12.2.md
index 2135d3ddcbf..44a04fcd603 100644
--- a/website/releases/release-0.12.2.md
+++ b/website/releases/release-0.12.2.md
@@ -1,6 +1,6 @@
---
title: "Release 0.12.2"
-sidebar_position: 8
+sidebar_position: 9
layout: releases
toc: true
---
diff --git a/website/releases/release-0.12.3.md
b/website/releases/release-0.12.3.md
index 21514240091..a320f2e74a7 100644
--- a/website/releases/release-0.12.3.md
+++ b/website/releases/release-0.12.3.md
@@ -1,6 +1,6 @@
---
title: "Release 0.12.3"
-sidebar_position: 6
+sidebar_position: 7
layout: releases
toc: true
last_modified_at: 2023-04-23T10:30:00+05:30
diff --git a/website/releases/release-0.13.0.md
b/website/releases/release-0.13.0.md
index 3ec23c1d5bd..e27050ceace 100644
--- a/website/releases/release-0.13.0.md
+++ b/website/releases/release-0.13.0.md
@@ -1,6 +1,6 @@
---
title: "Release 0.13.0"
-sidebar_position: 7
+sidebar_position: 8
layout: releases
toc: true
---
diff --git a/website/releases/release-0.13.1.md
b/website/releases/release-0.13.1.md
index 30e2cd6f7ec..f2888af3454 100644
--- a/website/releases/release-0.13.1.md
+++ b/website/releases/release-0.13.1.md
@@ -1,6 +1,6 @@
---
title: "Release 0.13.1"
-sidebar_position: 5
+sidebar_position: 6
layout: releases
toc: true
last_modified_at: 2023-05-25T13:00:00-08:00
diff --git a/website/releases/release-1.0.0-beta2.md
b/website/releases/release-1.0.0-beta2.md
new file mode 100644
index 00000000000..bea04c3bfd1
--- /dev/null
+++ b/website/releases/release-1.0.0-beta2.md
@@ -0,0 +1,85 @@
+---
+title: "Release 1.0.0-beta2"
+sidebar_position: 1
+layout: releases
+toc: true
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## [Release
1.0.0-beta2](https://github.com/apache/hudi/releases/tag/release-1.0.0-beta2)
([docs](/docs/next/quick-start-guide))
+
+Apache Hudi 1.0.0-beta2 is the second beta release of Apache Hudi. This
release is meant for early adopters to try
+out the new features and provide feedback. The release is not meant for
production use.
+
+## Migration Guide
+
+This release contains major format changes as we will see in highlights below.
We encourage users to try out the
+**1.0.0-beta2** features on new tables. The 1.0 general availability (GA)
release will support automatic table upgrades
+from 0.x versions, while also ensuring full backward compatibility when
reading 0.x Hudi tables using 1.0, ensuring a
+seamless migration experience.
+
+:::caution
+Given that timeline format and log file format has changed in this **beta
release**, it is recommended not to attempt to do
+rolling upgrades from older versions to this release.
+:::
+
+## Highlights
+
+### Format changes
+
+[HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242) is the main epic
covering all the format changes proposals,
+which are also partly covered in the [Hudi 1.0 tech
specification](/tech-specs-1point0). The following are the main
+changes in this release:
+
+#### Timeline
+
+No major changes in this release. Refer to
[1.0.0-beta1#timeline](release-1.0.0-beta1.md#timeline) for more details.
+
+#### Log File Format
+
+In addition to the fields in the log file header added in
[1.0.0-beta1](release-1.0.0-beta1.md#log-file-format), we also
+store a flag, `IS_PARTIAL` to indicate whether the log block contains partial
updates or not.
+
+### Metadata indexes
+
+In 1.0.0-beta1, we added support for functional index. In 1.0.0-beta2, we have
added support for secondary indexes and
+partition stats index to the [multi-modal
indexing](/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
subsystem.
+
+#### Secondary Index
+
+Secondary indexes allow users to create indexes on columns that are not part
of record key columns in Hudi tables (for
+record key fields, Hudi supports [Record-level
Index](/blog/2023/11/01/record-level-index). Secondary indexes can be used to
speed up
+queries with predicate on columns other than record key columns.
+
+#### Partition Stats Index
+
+Partition stats index aggregates statistics at the partition level for the
columns for which it is enabled. This helps
+in efficient partition pruning even for non-partition fields.
+
+To try out these features, refer to the [SQL
guide](/docs/next/sql_ddl#create-partition-stats-and-secondary-index-experimental).
+
+### API Changes
+
+#### Positional Merging with Filegroup Reader
+
+In 1.0.0-beta1, we added a new [filegroup
reader](/releases/release-1.0.0-beta1#new-filegroup-reader), which provides
+5.7x performance benefits for snapshot queries on Merge-on-Read tables with
updates. The reader now
+provides position-based merging, as an alternative to existing key-based
merging, and skipping pages based on record
+positions. The new filegroup reader is integrated with Spark and Hive, and
enabled by default. To enable positional
+merging set below configs:
+
+```properties
+hoodie.merge.use.record.positions=true
+```
+
+### Hudi-Flink Enhancements
+
+This release comes with the support for [lookup
joins](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join).
+A lookup join is typically used to enrich a table with data that is queried
from an external system. The join requires
+one table to have a processing time attribute and the other table to be backed
by a lookup source connector. Head over
+to the [FLink SQL guide](/docs/next/sql_dml#lookup-joins) to try out this
feature.
+
+## Raw Release Notes
+
+The raw release notes are available
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12354810).