Re: [PR] [DOCS] Release notes 1.0.0-beta2 [hudi]

via GitHub Sat, 13 Jul 2024 10:09:13 -0700


nsivabalan commented on code in PR #11618:
URL: https://github.com/apache/hudi/pull/11618#discussion_r1676856952



##########
website/releases/release-1.0.0-beta2.md:
##########
@@ -0,0 +1,80 @@
+---
+title: "Release 1.0.0-beta2"
+sidebar_position: 1
+layout: releases
+toc: true
+---
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+## [Release 
1.0.0-beta2](https://github.com/apache/hudi/releases/tag/release-1.0.0-beta2) 
([docs](/docs/next/quick-start-guide))
+
+Apache Hudi 1.0.0-beta2 is the second beta release of Apache Hudi. This 
release is meant for early adopters to try
+out the new features and provide feedback. The release is not meant for 
production use.
+
+## Migration Guide
+
+This release contains major format changes as we will see in highlights below. 
We encourage users to try out the
+**1.0.0-beta2** features on new tables. The 1.0 general availability (GA) 
release will support automatic table upgrades
+from 0.x versions, while also ensuring full backward compatibility when 
reading 0.x Hudi tables using 1.0, ensuring a
+seamless migration experience.
+
+:::caution
+Given that timeline format and log file format has changed in this **beta 
release**, it is recommended not to attempt to do
+rolling upgrades from older versions to this release.
+:::
+
+## Highlights
+
+### Format changes
+
+[HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242) is the main epic 
covering all the format changes proposals,
+which are also partly covered in the [Hudi 1.0 tech 
specification](/tech-specs-1point0). The following are the main
+changes in this release:
+
+#### Timeline
+
+No major changes in this release. Refer to 
[1.0.0-beta1#timeline](release-1.0.0-beta1.md#timeline) for more details.
+
+#### Log File Format
+
+In addition to the fields in the log file header added in 
[1.0.0-beta1](release-1.0.0-beta1.md#log-file-format), we also
+store a flag, `IS_PARTIAL` to indicate whether the log block contains partial 
updates or not.
+
+### Metadata indexes
+
+In 1.0.0-beta1, we added support for functional index. In 1.0.0-beta2, we have 
added support for secondary indexes and
+partition stats index to the [multi-modal 
indexing](/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
 subsystem.
+
+#### Secondary Indexes
+
+Secondary indexes allow users to create indexes on columns that are not part 
of record key columns in Hudi tables (for 
+record key fields, Hudi supports [Record-level 
Index](/blog/2023/11/01/record-level-index). Secondary indexes can be used to 
speed up
+queries with predicate on columns other than record key columns.
+
+#### Partition Stats Index
+
+Partition stats index aggregates statistics at the partition level for the 
columns for which it is enabled. This helps
+in efficient partition pruning even for non-partition fields.
+
+To try out these features, refer to the [SQL 
guide](/docs/next/sql_ddl#create-partition-stats-index).
+
+### API Changes
+
+#### Positional Merging
+
+In 1.0.0-beta1, we added a new [filegroup 
reader](/releases/release-1.0.0-beta1#new-filegroup-reader). The reader now
+provides position-based merging, as an alternative to existing key-based 
merging, and skipping pages based on record
+positions. The new filegroup reader is integrated with Spark and Hive, and 
enabled by default. To enable positional
+merging set below configs:
+
+```properties

Review Comment:
   not related to this doc PR. curious in general.
   if we have fallback mechanism to do key based merges if positional based 
merges are not possible, why not we enable this by default? 



##########
website/docs/metadata.md:
##########
@@ -90,6 +90,32 @@ Following are the different indices currently available 
under the metadata table
   Hudi release, this index aids in locating records faster than other existing 
indices and can provide a speedup orders of magnitude 
   faster in large deployments where index lookup dominates write latencies.
 
+#### New Indexes in 1.0.0
+
+- ***Functional Index***:
+  A [functional 
index](https://github.com/apache/hudi/blob/3789840be3d041cbcfc6b24786740210e4e6d6ac/rfc/rfc-63/rfc-63.md)
+  is an index on a function of a column. If a query has a predicate on a 
function of a column, the functional index can
+  be used to speed up the query. Functional index is stored in *func_index_* 
prefixed partitions (one for each
+  function) under metadata table. Functional index can be created using SQL 
syntax. Please checkout SQL DDL
+  docs [here](/docs/next/sql_ddl#create-functional-index) for more details.
+
+- ***Partition Stats Index***
+  Partition stats index aggregates statistics at the partition level for the 
columns for which it is enabled. This helps
+  in efficient partition pruning even for non-partition fields. The partition 
stats index is stored in *partition_stats*
+  partition under metadata table. Partition stats index can be enabled using 
the following configs (note it is required
+  to specify the columns for which stats should be aggregated):
+  ```properties
+    hoodie.metadata.index.partition.stats.enable=true
+    hoodie.metadata.index.column.stats.columns=<comma-separated-column-names>
+  ```
+  
+- ***Secondary Index***:
+  Secondary indexes allow users to create indexes on columns that are not part 
of record key columns in Hudi tables (for
+  record key fields, Hudi supports [Record-level 
Index](/blog/2023/11/01/record-level-index). Secondary indexes
+  can be used to speed up queries with predicate on columns other than record 
key columns. 
+
+To try out these features, refer to the [SQL 
guide](/docs/next/sql_ddl#create-partition-stats-index).

Review Comment:
   don't we have a separate section for sec index? this is referring to 
partition stats index?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [DOCS] Release notes 1.0.0-beta2 [hudi]

Reply via email to