This is an automated email from the ASF dual-hosted git repository.

codope pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 7ec281297ca [DOCS] Release notes 1.0.0-beta2 (#11618)
7ec281297ca is described below

commit 7ec281297ca26f647ffa6217a13d2b309128ae09
Author: Sagar Sumit <[email protected]>
AuthorDate: Tue Jul 16 12:14:28 2024 +0530

    [DOCS] Release notes 1.0.0-beta2 (#11618)
    
    * [DOCS] Release notes for 1.0.0-beta2
    
    * add sql with limitations
    
    * Fix build
    
    * Update sidebars and some more items in release notes
    
    * Fix sidebars, links and address other comments
---
 website/docs/metadata.md                | 26 ++++++++++
 website/docs/sql_ddl.md                 | 88 ++++++++++++++++++++++++++++++++-
 website/docs/sql_dml.md                 | 24 +++++++++
 website/releases/download.md            |  6 ++-
 website/releases/older-releases.md      |  2 +-
 website/releases/release-0.10.0.md      |  2 +-
 website/releases/release-0.10.1.md      |  2 +-
 website/releases/release-0.11.0.md      |  2 +-
 website/releases/release-0.11.1.md      |  2 +-
 website/releases/release-0.12.0.md      |  2 +-
 website/releases/release-0.12.1.md      |  2 +-
 website/releases/release-0.12.2.md      |  2 +-
 website/releases/release-0.12.3.md      |  2 +-
 website/releases/release-0.13.0.md      |  2 +-
 website/releases/release-0.13.1.md      |  2 +-
 website/releases/release-1.0.0-beta2.md | 85 +++++++++++++++++++++++++++++++
 16 files changed, 238 insertions(+), 13 deletions(-)

diff --git a/website/docs/metadata.md b/website/docs/metadata.md
index 413114f13a5..68c5aaa9f8e 100644
--- a/website/docs/metadata.md
+++ b/website/docs/metadata.md
@@ -90,6 +90,32 @@ Following are the different indices currently available 
under the metadata table
   Hudi release, this index aids in locating records faster than other existing 
indices and can provide a speedup orders of magnitude 
   faster in large deployments where index lookup dominates write latencies.
 
+#### New Indexes in 1.0.0
+
+- ***Functional Index***:
+  A [functional 
index](https://github.com/apache/hudi/blob/3789840be3d041cbcfc6b24786740210e4e6d6ac/rfc/rfc-63/rfc-63.md)
+  is an index on a function of a column. If a query has a predicate on a 
function of a column, the functional index can
+  be used to speed up the query. Functional index is stored in *func_index_* 
prefixed partitions (one for each
+  function) under metadata table. Functional index can be created using SQL 
syntax. Please checkout SQL DDL
+  docs [here](/docs/next/sql_ddl#create-functional-index-experimental) for 
more details.
+
+- ***Partition Stats Index***
+  Partition stats index aggregates statistics at the partition level for the 
columns for which it is enabled. This helps
+  in efficient partition pruning even for non-partition fields. The partition 
stats index is stored in *partition_stats*
+  partition under metadata table. Partition stats index can be enabled using 
the following configs (note it is required
+  to specify the columns for which stats should be aggregated):
+  ```properties
+    hoodie.metadata.index.partition.stats.enable=true
+    hoodie.metadata.index.column.stats.columns=<comma-separated-column-names>
+  ```
+  
+- ***Secondary Index***:
+  Secondary indexes allow users to create indexes on columns that are not part 
of record key columns in Hudi tables (for
+  record key fields, Hudi supports [Record-level 
Index](/blog/2023/11/01/record-level-index). Secondary indexes
+  can be used to speed up queries with predicate on columns other than record 
key columns. 
+
+To try out these features, refer to the [SQL 
guide](/docs/next/sql_ddl#create-partition-stats-and-secondary-index-experimental).
+
 ## Enable Hudi Metadata Table and Multi-Modal Index in write side
 
 Following are the Spark based basic configs that are needed to enable metadata 
and multi-modal indices. For advanced configs please refer 
diff --git a/website/docs/sql_ddl.md b/website/docs/sql_ddl.md
index eebadfc580e..08d8380afaf 100644
--- a/website/docs/sql_ddl.md
+++ b/website/docs/sql_ddl.md
@@ -217,7 +217,13 @@ DROP INDEX [IF EXISTS] index_name ON [TABLE] table_name
 - Both index and column on which the index is created can be qualified with 
some options in the form of key-value pairs.
   We will see this with an example of functional index below. 
 
-#### Create Functional Index
+:::note
+Except for the `files`, `column_stats`, `bloom_filters` and `record_index`, 
all other indexes are experimental. We
+encourage users to try out these features on new tables and provide feedback. 
Below, we have also listed current
+limitations of these indexes.
+:::
+
+#### Create Functional Index (Experimental)
 
 A [functional 
index](https://github.com/apache/hudi/blob/00ece7bce0a4a8d0019721a28049723821e01842/rfc/rfc-63/rfc-63.md)
 
 is an index on a function of a column. It is a new addition to Hudi's 
[multi-modal 
indexing](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
 
@@ -328,6 +334,86 @@ Project [city#2970, fare#2969, rider#2967, driver#2968], 
Statistics(sizeInBytes=
 ```
 </details>
 
+#### Create Partition Stats and Secondary Index (Experimental)
+
+Hudi supports various [indexes](/docs/next/metadata#metadata-table-indices). 
Let us see how we can use them in the following example.
+
+```sql
+DROP TABLE IF EXISTS hudi_table;
+-- Let us create a table with multiple partition fields, and enable record 
index and partition stats index 
+CREATE TABLE hudi_table (
+    ts BIGINT,
+    id STRING,
+    rider STRING,
+    driver STRING,
+    fare DOUBLE,
+    city STRING,
+    state STRING
+) USING hudi
+ OPTIONS(
+    primaryKey ='id',
+    hoodie.metadata.record.index.enable = 'true', -- enable record index
+    hoodie.metadata.index.partition.stats.enable = 'true', -- enable partition 
stats index
+    hoodie.metadata.index.column.stats.column.list = 'rider' -- create 
partition stats index on rider column
+)
+PARTITIONED BY (city, state)
+LOCATION 'file:///tmp/hudi_test_table';
+
+INSERT INTO hudi_table VALUES 
(1695159649,'trip1','rider-A','driver-K',19.10,'san_francisco','california');
+INSERT INTO hudi_table VALUES 
(1695091554,'trip2','rider-C','driver-M',27.70,'sunnyvale','california');
+INSERT INTO hudi_table VALUES 
(1695332066,'trip3','rider-E','driver-O',93.50,'austin','texas');
+INSERT INTO hudi_table VALUES 
(1695516137,'trip4','rider-F','driver-P',34.15,'houston','texas');
+
+-- Enable data skipping for the reader
+set hoodie.metadata.enable=true;
+set hoodie.enable.data.skipping=true;
+    
+-- simple partition predicate --
+select * from hudi_table where city = 'sunnyvale';
+20240710215107477      20240710215107477_0_0   trip2   
city=sunnyvale/state=california 
1dcb14a9-bc4a-4eac-aab5-015f2254b7ec-0_0-40-75_20240710215107477.parquet        
1695091554      trip2   rider-C driver-M        27.7    sunnyvale       
california
+Time taken: 0.58 seconds, Fetched 1 row(s)
+
+-- simple partition predicate on other partition field --
+select * from hudi_table where state = 'texas';
+20240710215119846      20240710215119846_0_0   trip4   
city=houston/state=texas        
08c6ed2c-a87b-4798-8f70-6d8b16cb1932-0_0-74-133_20240710215119846.parquet       
1695516137      trip4   rider-F driver-P        34.15   houston texas
+20240710215110584      20240710215110584_0_0   trip3   city=austin/state=texas 
0ab2243c-cc08-4da3-8302-4ce0b4c47a08-0_0-57-104_20240710215110584.parquet       
1695332066      trip3   rider-E driver-O        93.5    austin  texas
+Time taken: 0.124 seconds, Fetched 2 row(s)
+
+-- predicate on a column for which partition stats are present --
+select id, rider, city, state from hudi_table where rider > 'rider-D';
+trip4  rider-F houston texas
+trip3  rider-E austin  texas
+Time taken: 0.703 seconds, Fetched 2 row(s)
+      
+-- record key predicate --
+SELECT id, rider, driver FROM hudi_table WHERE id = 'trip1';
+trip1  rider-A driver-K
+Time taken: 0.368 seconds, Fetched 1 row(s)
+      
+-- create secondary index on driver --
+CREATE INDEX driver_idx ON hudi_table USING secondary_index(driver);
+
+-- secondary key predicate --
+SELECT id, driver, city, state FROM hudi_table WHERE driver IN ('driver-K', 
'driver-M');
+trip1  driver-K        san_francisco   california
+trip2  driver-M        sunnyvale       california
+Time taken: 0.83 seconds, Fetched 2 row(s)
+```
+
+**Limitations of using these indexes:**
+
+- Unlike column stats, partition stats index is not created automatically for 
all columns. Users must specify list of
+  columns for which they want to create partition stats index.
+- Predicate on internal meta fields such as `_hoodie_record_key` or 
`_hoodie_partition_path` cannot be used for data
+  skipping. Queries with such predicates cannot leverage the indexes.
+- Secondary index is not supported for nested fields.
+- Index update can fail with schema evolution.
+- If there are multiple indexes present, then secondary index and functional 
index update can fail.
+- Only one index can be created at a time using [async 
indexer](/docs/next/metadata_indexing).
+- Ensure native HFile reader is disabled (`_hoodie.hfile.use.native.reader`) 
to leverage the secondary index. Default value for this config is `false`.
+
+Limitations will be addressed before 1.0.0 is made generally available.
+
 ### Setting Hudi configs 
 
 There are different ways you can pass the configs for a given hudi table. 
diff --git a/website/docs/sql_dml.md b/website/docs/sql_dml.md
index edb63730b13..04590765f3f 100644
--- a/website/docs/sql_dml.md
+++ b/website/docs/sql_dml.md
@@ -266,6 +266,30 @@ DELETE FROM hudi_table WHERE price < 100;
 Delete query only work with batch excution mode.
 :::
 
+### Lookup Joins
+
+A lookup join is typically used to enrich a table with data that is queried 
from an external system. The join requires
+one table to have a processing time attribute and the other table to be backed 
by a lookup source connector.
+
+```sql
+CREATE TABLE datagen_source(
+    id int,
+    name STRING,
+    proctime as PROCTIME()
+) WITH (
+'connector' = 'datagen',
+'rows-per-second'='1',
+'number-of-rows' = '2',
+'fields.id.kind'='sequence',
+'fields.id.start'='1',
+'fields.id.end'='2'
+);
+
+SELECT o.id,o.name,b.id as id2
+FROM datagen_source AS o
+JOIN hudi_table/*+ OPTIONS('lookup.join.cache.ttl'= '2 day') */ FOR 
SYSTEM_TIME AS OF o.proctime AS b on o.id = b.id; 
+```
+
 ### Setting Writer/Reader Configs
 With Flink SQL, you can additionally set the writer/reader writer configs 
along with the query.
 
diff --git a/website/releases/download.md b/website/releases/download.md
index 023e1557825..148be08abc2 100644
--- a/website/releases/download.md
+++ b/website/releases/download.md
@@ -6,6 +6,10 @@ toc: true
 last_modified_at: 2022-12-27T15:59:57-04:00
 ---
 
+### Release 1.0.0-beta2
+* Source Release : [Apache Hudi 1.0.0-beta2 Source 
Release](https://downloads.apache.org/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz)
 
([asc](https://downloads.apache.org/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.asc),
 
[sha512](https://downloads.apache.org/hudi/1.0.0-beta2/hudi-1.0.0-beta2.src.tgz.sha512))
+* Release Note : ([Release Note for Apache Hudi 
1.0.0-beta2](/releases/release-1.0.0-beta2))
+
 ### Release 0.15.0
 * Source Release : [Apache Hudi 0.15.0 Source 
Release](https://downloads.apache.org/hudi/0.15.0/hudi-0.15.0.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.15.0/hudi-0.15.0.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.15.0/hudi-0.15.0.src.tgz.sha512))
 * Release Note : ([Release Note for Apache Hudi 
0.15.0](/releases/release-0.15.0))
@@ -16,7 +20,7 @@ last_modified_at: 2022-12-27T15:59:57-04:00
 
 ### Release 1.0.0-beta1
 * Source Release : [Apache Hudi 1.0.0-beta1 Source 
Release](https://www.apache.org/dyn/closer.lua/hudi/1.0.0-beta1/hudi-1.0.0-beta1.src.tgz)
 
([asc](https://downloads.apache.org/hudi/1.0.0-beta1/hudi-1.0.0-beta1.src.tgz.asc),
 
[sha512](https://downloads.apache.org/hudi/1.0.0-beta1/hudi-1.0.0-beta1.src.tgz.sha512))
-* Release Note : ([Release Note for Apache Hudi 
0.14.0](/releases/release-1.0.0-beta1))
+* Release Note : ([Release Note for Apache Hudi 
1.0.0-beta1](/releases/release-1.0.0-beta1))
 
 ### Release 0.12.3
 [Long Term Support](/releases/release-0.12.3#long-term-support): this is the 
latest stable release
diff --git a/website/releases/older-releases.md 
b/website/releases/older-releases.md
index 4d9e75005f3..ea044b31efd 100644
--- a/website/releases/older-releases.md
+++ b/website/releases/older-releases.md
@@ -1,6 +1,6 @@
 ---
 title: "Older Releases"
-sidebar_position: 19
+sidebar_position: 20
 layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
diff --git a/website/releases/release-0.10.0.md 
b/website/releases/release-0.10.0.md
index 9ca15db71f1..6f5551a0d01 100644
--- a/website/releases/release-0.10.0.md
+++ b/website/releases/release-0.10.0.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.10.0"
-sidebar_position: 14
+sidebar_position: 15
 layout: releases
 toc: true
 ---
diff --git a/website/releases/release-0.10.1.md 
b/website/releases/release-0.10.1.md
index 5856f0eb100..42ec76ee1f6 100644
--- a/website/releases/release-0.10.1.md
+++ b/website/releases/release-0.10.1.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.10.1"
-sidebar_position: 13
+sidebar_position: 14
 layout: releases
 toc: true
 ---
diff --git a/website/releases/release-0.11.0.md 
b/website/releases/release-0.11.0.md
index d0cdef30951..fbea4897b45 100644
--- a/website/releases/release-0.11.0.md
+++ b/website/releases/release-0.11.0.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.11.0"
-sidebar_position: 12
+sidebar_position: 13
 layout: releases
 toc: true
 last_modified_at: 2022-01-27T22:07:00+08:00
diff --git a/website/releases/release-0.11.1.md 
b/website/releases/release-0.11.1.md
index 5aa5d89e11b..6f727ddccd2 100644
--- a/website/releases/release-0.11.1.md
+++ b/website/releases/release-0.11.1.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.11.1"
-sidebar_position: 11
+sidebar_position: 12
 layout: releases
 toc: true
 last_modified_at: 2022-06-19T23:30:00-07:00
diff --git a/website/releases/release-0.12.0.md 
b/website/releases/release-0.12.0.md
index 78b27997962..93be2c17e55 100644
--- a/website/releases/release-0.12.0.md
+++ b/website/releases/release-0.12.0.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.12.0"
-sidebar_position: 10
+sidebar_position: 11
 layout: releases
 toc: true
 ---
diff --git a/website/releases/release-0.12.1.md 
b/website/releases/release-0.12.1.md
index b4f8d643c7d..8d1f002a79b 100644
--- a/website/releases/release-0.12.1.md
+++ b/website/releases/release-0.12.1.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.12.1"
-sidebar_position: 9
+sidebar_position: 10
 layout: releases
 toc: true
 ---
diff --git a/website/releases/release-0.12.2.md 
b/website/releases/release-0.12.2.md
index 2135d3ddcbf..44a04fcd603 100644
--- a/website/releases/release-0.12.2.md
+++ b/website/releases/release-0.12.2.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.12.2"
-sidebar_position: 8
+sidebar_position: 9
 layout: releases
 toc: true
 ---
diff --git a/website/releases/release-0.12.3.md 
b/website/releases/release-0.12.3.md
index 21514240091..a320f2e74a7 100644
--- a/website/releases/release-0.12.3.md
+++ b/website/releases/release-0.12.3.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.12.3"
-sidebar_position: 6
+sidebar_position: 7
 layout: releases
 toc: true
 last_modified_at: 2023-04-23T10:30:00+05:30
diff --git a/website/releases/release-0.13.0.md 
b/website/releases/release-0.13.0.md
index 3ec23c1d5bd..e27050ceace 100644
--- a/website/releases/release-0.13.0.md
+++ b/website/releases/release-0.13.0.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.13.0"
-sidebar_position: 7
+sidebar_position: 8
 layout: releases
 toc: true
 ---
diff --git a/website/releases/release-0.13.1.md 
b/website/releases/release-0.13.1.md
index 30e2cd6f7ec..f2888af3454 100644
--- a/website/releases/release-0.13.1.md
+++ b/website/releases/release-0.13.1.md
@@ -1,6 +1,6 @@
 ---
 title: "Release 0.13.1"
-sidebar_position: 5
+sidebar_position: 6
 layout: releases
 toc: true
 last_modified_at: 2023-05-25T13:00:00-08:00
diff --git a/website/releases/release-1.0.0-beta2.md 
b/website/releases/release-1.0.0-beta2.md
new file mode 100644
index 00000000000..bea04c3bfd1
--- /dev/null
+++ b/website/releases/release-1.0.0-beta2.md
@@ -0,0 +1,85 @@
+---
+title: "Release 1.0.0-beta2"
+sidebar_position: 1
+layout: releases
+toc: true
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## [Release 
1.0.0-beta2](https://github.com/apache/hudi/releases/tag/release-1.0.0-beta2) 
([docs](/docs/next/quick-start-guide))
+
+Apache Hudi 1.0.0-beta2 is the second beta release of Apache Hudi. This 
release is meant for early adopters to try
+out the new features and provide feedback. The release is not meant for 
production use.
+
+## Migration Guide
+
+This release contains major format changes as we will see in highlights below. 
We encourage users to try out the
+**1.0.0-beta2** features on new tables. The 1.0 general availability (GA) 
release will support automatic table upgrades
+from 0.x versions, while also ensuring full backward compatibility when 
reading 0.x Hudi tables using 1.0, ensuring a
+seamless migration experience.
+
+:::caution
+Given that timeline format and log file format has changed in this **beta 
release**, it is recommended not to attempt to do
+rolling upgrades from older versions to this release.
+:::
+
+## Highlights
+
+### Format changes
+
+[HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242) is the main epic 
covering all the format changes proposals,
+which are also partly covered in the [Hudi 1.0 tech 
specification](/tech-specs-1point0). The following are the main
+changes in this release:
+
+#### Timeline
+
+No major changes in this release. Refer to 
[1.0.0-beta1#timeline](release-1.0.0-beta1.md#timeline) for more details.
+
+#### Log File Format
+
+In addition to the fields in the log file header added in 
[1.0.0-beta1](release-1.0.0-beta1.md#log-file-format), we also
+store a flag, `IS_PARTIAL` to indicate whether the log block contains partial 
updates or not.
+
+### Metadata indexes
+
+In 1.0.0-beta1, we added support for functional index. In 1.0.0-beta2, we have 
added support for secondary indexes and
+partition stats index to the [multi-modal 
indexing](/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
 subsystem.
+
+#### Secondary Index
+
+Secondary indexes allow users to create indexes on columns that are not part 
of record key columns in Hudi tables (for 
+record key fields, Hudi supports [Record-level 
Index](/blog/2023/11/01/record-level-index). Secondary indexes can be used to 
speed up
+queries with predicate on columns other than record key columns.
+
+#### Partition Stats Index
+
+Partition stats index aggregates statistics at the partition level for the 
columns for which it is enabled. This helps
+in efficient partition pruning even for non-partition fields.
+
+To try out these features, refer to the [SQL 
guide](/docs/next/sql_ddl#create-partition-stats-and-secondary-index-experimental).
+
+### API Changes
+
+#### Positional Merging with Filegroup Reader
+
+In 1.0.0-beta1, we added a new [filegroup 
reader](/releases/release-1.0.0-beta1#new-filegroup-reader), which provides
+5.7x performance benefits for snapshot queries on Merge-on-Read tables with 
updates. The reader now
+provides position-based merging, as an alternative to existing key-based 
merging, and skipping pages based on record
+positions. The new filegroup reader is integrated with Spark and Hive, and 
enabled by default. To enable positional
+merging set below configs:
+
+```properties
+hoodie.merge.use.record.positions=true
+```
+
+### Hudi-Flink Enhancements
+
+This release comes with the support for [lookup 
joins](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/joins/#lookup-join).
+A lookup join is typically used to enrich a table with data that is queried 
from an external system. The join requires
+one table to have a processing time attribute and the other table to be backed 
by a lookup source connector. Head over 
+to the [FLink SQL guide](/docs/next/sql_dml#lookup-joins) to try out this 
feature.
+
+## Raw Release Notes
+
+The raw release notes are available 
[here](https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12354810).

Reply via email to