This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 07ff8e48c0ae docs: update roadmap items (#14358)
07ff8e48c0ae is described below
commit 07ff8e48c0ae3cd16fbd3f5a7bb512910fb3ad63
Author: Shiyan Xu <[email protected]>
AuthorDate: Tue Nov 25 20:00:30 2025 -0600
docs: update roadmap items (#14358)
---
...ve-into-hudis-indexing-subsystem-part-1-of-2.md | 4 +--
...ve-into-hudis-indexing-subsystem-part-2-of-2.md | 6 ++--
website/docs/hudi_stack.md | 7 +++-
website/src/pages/ecosystem.md | 4 +--
website/src/pages/roadmap.md | 40 ++++++++-------------
.../assets/images/hudi_stack/pluggable_tf.png | Bin 0 -> 77190 bytes
website/versioned_docs/version-1.1.0/hudi_stack.md | 7 +++-
7 files changed, 34 insertions(+), 34 deletions(-)
diff --git
a/website/blog/2025-10-29-deep-dive-into-hudis-indexing-subsystem-part-1-of-2.md
b/website/blog/2025-10-29-deep-dive-into-hudis-indexing-subsystem-part-1-of-2.md
index e1cd9a03d495..9377a32609c3 100644
---
a/website/blog/2025-10-29-deep-dive-into-hudis-indexing-subsystem-part-1-of-2.md
+++
b/website/blog/2025-10-29-deep-dive-into-hudis-indexing-subsystem-part-1-of-2.md
@@ -13,7 +13,7 @@ tags:
For decades, databases have relied on indexes—specialized data structures—to
dramatically improve read and write performance by quickly locating specific
records. Apache Hudi extends this fundamental principle to the data lakehouse
with a unique and powerful approach. Every Hudi table contains a self-managed
metadata table that functions as an indexing subsystem, enabling efficient data
skipping and fast record lookups across a wide range of read and write
scenarios.
-This two-part series dives into Hudi’s indexing subsystem. Part 1 explains the
internal layout and data-skipping capabilities. Part 2 covers advanced
features—record, secondary, and expression indexes—and asynchronous index
maintenance. By the end, you’ll know how to leverage Hudi’s multimodal index to
build more efficient lakehouse tables.
+This two-part series dives into Hudi’s indexing subsystem. Part 1 explains the
internal layout and data-skipping capabilities. [part
2](https://hudi.apache.org/blog/2025/11/12/deep-dive-into-hudis-indexing-subsystem-part-2-of-2/)
covers advanced features—record, secondary, and expression indexes—and
asynchronous index maintenance. By the end, you’ll know how to leverage Hudi’s
multimodal index to build more efficient lakehouse tables.
## The Metadata Table
@@ -210,4 +210,4 @@ Hudi’s metadata table is itself a Hudi Merge‑on‑Read (MOR)
table that acts
Index maintenance happens transactionally alongside data writes, keeping index
entries consistent with the data table. Periodic compaction merges log files
into read‑optimized HFile base files to keep point lookups fast and
predictable. On the read path, Hudi composes multiple indexes to minimize I/O:
the files index enumerates candidates, partition stats prune irrelevant
partitions, and column stats prune non‑matching files. In effect, the engine
scans only the minimum set of files requ [...]
-In practice, the defaults are a strong starting point. Keep the metadata table
enabled and explicitly list only the columns you frequently filter on via
`hoodie.metadata.index.column.stats.column.list` to control metadata overhead.
In Part 2, we’ll go deeper into accelerating equality‑matching and
expression‑based predicates using the record, secondary, and expression
indexes, and discuss how asynchronous index maintenance keeps writers unblocked
while indexes build in the background.
+In practice, the defaults are a strong starting point. Keep the metadata table
enabled and explicitly list only the columns you frequently filter on via
`hoodie.metadata.index.column.stats.column.list` to control metadata overhead.
In [part
2](https://hudi.apache.org/blog/2025/11/12/deep-dive-into-hudis-indexing-subsystem-part-2-of-2/),
we’ll go deeper into accelerating equality‑matching and expression‑based
predicates using the record, secondary, and expression indexes, and discuss how
[...]
diff --git
a/website/blog/2025-11-12-deep-dive-into-hudis-indexing-subsystem-part-2-of-2.md
b/website/blog/2025-11-12-deep-dive-into-hudis-indexing-subsystem-part-2-of-2.md
index b5dc843a0f3b..83170d4cfa25 100644
---
a/website/blog/2025-11-12-deep-dive-into-hudis-indexing-subsystem-part-2-of-2.md
+++
b/website/blog/2025-11-12-deep-dive-into-hudis-indexing-subsystem-part-2-of-2.md
@@ -11,9 +11,9 @@ tags:
- data skipping
---
-In [Part
1](https://hudi.apache.org/blog/2025/10/29/deep-dive-into-hudis-indexing-subsystem-part-1-of-2/),
we explored how Hudi's metadata table functions as a self-managed, multimodal
indexing subsystem. We covered its internal architecture—a partitioned Hudi
Merge-on-Read (MOR) table using HFile format for efficient key lookups—and how
the files, column stats, and partition stats indexes work together to implement
powerful data skipping. These indexes dramatically reduce I/O by pruning [...]
+In [part
1](https://hudi.apache.org/blog/2025/10/29/deep-dive-into-hudis-indexing-subsystem-part-1-of-2/),
we explored how Hudi's metadata table functions as a self-managed, multimodal
indexing subsystem. We covered its internal architecture—a partitioned Hudi
Merge-on-Read (MOR) table using HFile format for efficient key lookups—and how
the files, column stats, and partition stats indexes work together to implement
powerful data skipping. These indexes dramatically reduce I/O by pruning [...]
-Now in Part 2, we'll dive into more specialized indexes that handle different
query patterns. We'll look at the record and secondary indexes, which provide
exact file locations for equality-matching predicates rather than just skipping
irrelevant files. We'll explore expression indexes that optimize queries with
inline transformations like `from_unixtime()` or `substring()`. Finally, we'll
cover async indexing, which lets you build resource-intensive indexes in the
background without blo [...]
+Now in part 2, we'll dive into more specialized indexes that handle different
query patterns. We'll look at the record and secondary indexes, which provide
exact file locations for equality-matching predicates rather than just skipping
irrelevant files. We'll explore expression indexes that optimize queries with
inline transformations like `from_unixtime()` or `substring()`. Finally, we'll
cover async indexing, which lets you build resource-intensive indexes in the
background without blo [...]
## Equality Matching with Record and Secondary Indexes
@@ -128,7 +128,7 @@ To manage this concurrency, a lock provider must be
configured for both the inde
## Summary
-Throughout this two-part series, we've explored how Hudi's indexing subsystem
brings database-grade performance to the data lakehouse. In Part 1, we examined
the metadata table's architecture and how files, column stats, and partition
stats indexes work together to skip irrelevant data. In Part 2, we covered
specialized indexes—record, secondary, and expression indexes—that provide
exact file locations for equality matching and handle transformed predicates.
We also looked at async index [...]
+Throughout this two-part series, we've explored how Hudi's indexing subsystem
brings database-grade performance to the data lakehouse. In [part
1](https://hudi.apache.org/blog/2025/10/29/deep-dive-into-hudis-indexing-subsystem-part-1-of-2/),
we examined the metadata table's architecture and how files, column stats, and
partition stats indexes work together to skip irrelevant data. In part 2, we
covered specialized indexes—record, secondary, and expression indexes—that
provide exact file [...]
Here's a quick guide for choosing the right indexes for your workload:
diff --git a/website/docs/hudi_stack.md b/website/docs/hudi_stack.md
index 189c9840b727..67e2bdbb1b8f 100644
--- a/website/docs/hudi_stack.md
+++ b/website/docs/hudi_stack.md
@@ -68,7 +68,12 @@ all Base Files is required. Read more about the various
table types in Hudi [tab
## Pluggable Table format
-Starting with Hudi 1.1, Hudi introduces a pluggable table format framework
that extends Hudi's powerful storage engine capabilities beyond its native
format to other table formats like Apache Iceberg and Delta Lake. This
framework decouples Hudi's core capabilities—transaction management, indexing,
concurrency control, and table services—from the specific storage format used
for data files. Hudi provides native format support (configured via
`hoodie.table.format=native` by default), whil [...]
+Starting with Hudi 1.1, Hudi introduces a pluggable table format framework
that extends Hudi's powerful storage engine capabilities beyond its native
format to other table formats like Apache Iceberg and Delta Lake. This
framework decouples Hudi's core capabilities—transaction management, indexing,
concurrency control, and table services—from the specific storage format used
for data files.
+
+
+<p align = "center">Pluggable Table Format</p>
+
+Hudi provides native format support (configured via
`hoodie.table.format=native` by default), while [Apache XTable
(incubating)](https://xtable.apache.org/) supplies pluggable format adapters
for formats like Iceberg and Delta Lake. The framework enables organizations to
choose the right format for each use case while maintaining a unified
operational experience and leveraging Hudi's sophisticated storage engine
across all formats. For example, you can write high-frequency updates to a H
[...]
## Storage Engine
diff --git a/website/src/pages/ecosystem.md b/website/src/pages/ecosystem.md
index d5391f3e4263..8c0cdb46ec57 100644
--- a/website/src/pages/ecosystem.md
+++ b/website/src/pages/ecosystem.md
@@ -1,8 +1,8 @@
---
-title: Ecosystem
+title: Integrations
---
-# Ecosystem Support
+# Integrations
While Apache Hudi works seamlessly with various application frameworks, SQL
query engines, and data warehouses, some systems might only offer read
capabilities.
In such cases, you can leverage another tool like Apache Spark or Apache Flink
to write data to Hudi tables and then use the read-compatible system for
querying.
diff --git a/website/src/pages/roadmap.md b/website/src/pages/roadmap.md
index e5bd113dac97..f678c3785137 100644
--- a/website/src/pages/roadmap.md
+++ b/website/src/pages/roadmap.md
@@ -24,18 +24,15 @@ down by areas on our [stack](/docs/hudi_stack).
| Feature | Target Release |
Tracking
|
|------------------------------------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Introduce `.abort` state in the timeline | 1.1.0 |
[HUDI-8189](https://issues.apache.org/jira/browse/HUDI-8189) |
-| Schema tracking in metadata table | 1.1.0 |
[HUDI-6778](https://issues.apache.org/jira/browse/HUDI-6778) |
-| Variant type support on Spark 4 | 1.1.0 |
[HUDI-9046](https://issues.apache.org/jira/browse/HUDI-9046) |
-| Non-blocking updates during clustering | 1.1.0 |
[HUDI-1045](https://issues.apache.org/jira/browse/HUDI-1045)
|
-| Track schema in metadata table | 1.1.0 |
[HUDI-6778](https://issues.apache.org/jira/browse/HUDI-6778)
|
-| Enable partial updates for CDC workload payload | 1.1.0 |
[HUDI-7229](https://issues.apache.org/jira/browse/HUDI-7229)
|
-| NBCC for MDT writes | 1.1.0 |
[HUDI-8480](https://issues.apache.org/jira/browse/HUDI-8480) |
-| Index abstraction for writer and reader | 1.1.0 |
[HUDI-9176](https://issues.apache.org/jira/browse/HUDI-9176) |
-| Vector search index | 1.1.0 |
[HUDI-9047](https://issues.apache.org/jira/browse/HUDI-9047) |
-| Bitmap index | 1.1.0 |
[HUDI-9048](https://issues.apache.org/jira/browse/HUDI-9048) |
-| Native HFile Writer and removal of HBase dependency | 1.1.0 |
[HUDI-8222](https://issues.apache.org/jira/browse/HUDI-8222) |
-| Pluggable Table Formats in Hudi | 1.1.0 |
[RFC-93,
HUDI-9332](https://github.com/apache/hudi/blob/master/rfc/rfc-93/rfc-93.md) |
+| Introduce `.abort` state in the timeline | 1.2.0 |
[HUDI-8189](https://issues.apache.org/jira/browse/HUDI-8189) |
+| Variant type support on Spark 4 | 1.2.0 |
[HUDI-9046](https://issues.apache.org/jira/browse/HUDI-9046) |
+| Non-blocking updates during clustering | 1.2.0 |
[HUDI-1045](https://issues.apache.org/jira/browse/HUDI-1045)
|
+| Enable partial updates for CDC workload payload | 1.2.0 |
[HUDI-7229](https://issues.apache.org/jira/browse/HUDI-7229)
|
+| Schema tracking in metadata table | 1.2.0 |
[HUDI-6778](https://issues.apache.org/jira/browse/HUDI-6778) |
+| NBCC for MDT writes | 1.2.0 |
[HUDI-8480](https://issues.apache.org/jira/browse/HUDI-8480) |
+| Index abstraction for writer and reader | 1.2.0 |
[HUDI-9176](https://issues.apache.org/jira/browse/HUDI-9176) |
+| Vector search index | 1.2.0 |
[HUDI-9047](https://issues.apache.org/jira/browse/HUDI-9047) |
+| Bitmap index | 1.2.0 |
[HUDI-9048](https://issues.apache.org/jira/browse/HUDI-9048) |
| New abstraction for schema, expressions, and filters | 1.2.0 |
[RFC-88](https://github.com/apache/hudi/pull/12795) |
| Streaming CDC/Incremental read improvement | 1.2.0 |
[HUDI-2749](https://issues.apache.org/jira/browse/HUDI-2749) |
| Supervised table service planning and execution | 1.2.0 |
[RFC-43](https://github.com/apache/hudi/pull/4309),
[HUDI-4147](https://issues.apache.org/jira/browse/HUDI-4147)
|
@@ -50,8 +47,7 @@ down by areas on our [stack](/docs/hudi_stack).
| Feature | Target Release |
Tracking
|
|---------------------------------------------------------|----------------|----------------------------------------------------------------------------------------------------------------------------|
-| Deprecate Payload and support CDC with built-in merge mode | 1.1.0 |
[HUDI-8401](https://issues.apache.org/jira/browse/HUDI-8401) |
-| New Hudi Table Format APIs for Query Integrations | 1.1.0 |
[RFC-64](https://github.com/apache/hudi/pull/7080),
[HUDI-4141](https://issues.apache.org/jira/browse/HUDI-4141) |
+| New Hudi Table Format APIs for Query Integrations | 1.2.0 |
[RFC-64](https://github.com/apache/hudi/pull/7080),
[HUDI-4141](https://issues.apache.org/jira/browse/HUDI-4141) |
| Snapshot view management | 1.2.0 |
[RFC-61](https://github.com/apache/hudi/pull/6576),
[HUDI-4677](https://issues.apache.org/jira/browse/HUDI-4677) |
| Support of verification with multiple event_time fields | 1.2.0 |
[RFC-59](https://github.com/apache/hudi/pull/6382),
[HUDI-4569](https://issues.apache.org/jira/browse/HUDI-4569) |
@@ -60,14 +56,9 @@ down by areas on our [stack](/docs/hudi_stack).
| Feature | Target Release |
Tracking
|
|---------------------------------------------------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Improve metadata table write DAG on Spark | 1.1.0 |
[HUDI-8462](https://issues.apache.org/jira/browse/HUDI-8462) |
-| Optimize performance with engine-native records on Flink | 1.1.0 |
[HUDI-8799](https://issues.apache.org/jira/browse/HUDI-8799) |
-| File group reader integration on Flink | 1.1.0 |
[HUDI-6788](https://issues.apache.org/jira/browse/HUDI-6788) |
-| File group reader integration with MDT read path | 1.1.0 |
[HUDI-8720](https://issues.apache.org/jira/browse/HUDI-8720) |
-| Default Java 17 support | 1.1.0
| [HUDI-6506](https://issues.apache.org/jira/browse/HUDI-6506)
|
-| Spark 4 Support | 1.1.0
| [HUDI-7915](https://issues.apache.org/jira/browse/HUDI-7915)
|
-| Spark datasource V2 read | 1.1.0 |
[HUDI-4449](https://issues.apache.org/jira/browse/HUDI-4449)
|
-| Simplification of engine integration and module organization | 1.1.0
| [HUDI-9502](https://issues.apache.org/jira/browse/HUDI-9502) |
+| Default Java 17 support | 1.2.0
| [HUDI-6506](https://issues.apache.org/jira/browse/HUDI-6506)
|
+| Spark datasource V2 read | 1.2.0 |
[HUDI-4449](https://issues.apache.org/jira/browse/HUDI-4449)
|
+| Simplification of engine integration and module organization | 1.2.0
| [HUDI-9502](https://issues.apache.org/jira/browse/HUDI-9502) |
| End-to-end DataFrame write path on Spark | 1.2.0 |
[HUDI-9019](https://issues.apache.org/jira/browse/HUDI-9019),
[HUDI-4857](https://issues.apache.org/jira/browse/HUDI-4857) |
| Support Hudi 1.0 release in Presto Hudi Connector | Presto Release /
Q2 | [HUDI-3210](https://issues.apache.org/jira/browse/HUDI-3210) |
| Support of new indexes in Presto Hudi Connector | Presto Release /
Q3 | [HUDI-4394](https://issues.apache.org/jira/browse/HUDI-4394),
[HUDI-4552](https://issues.apache.org/jira/browse/HUDI-4552) |
@@ -78,7 +69,7 @@ down by areas on our [stack](/docs/hudi_stack).
| Feature
| Target Release | Tracking
|
|---------------------------------------------------------------------------------------------------|----------------|----------------------------------------------------------------------------------------------------------------------------------------|
-| Syncing as non-partitoned tables in catalogs | 1.1.0 |
[HUDI-9503](https://issues.apache.org/jira/browse/HUDI-9503) |
+| Syncing as non-partitoned tables in catalogs | 1.2.0 |
[HUDI-9503](https://issues.apache.org/jira/browse/HUDI-9503) |
| Hudi Reverse streamer
| 1.2.0 |
[RFC-70](https://github.com/apache/hudi/pull/9040)
|
| Diagnostic Reporter
| 1.2.0 |
[RFC-62](https://github.com/apache/hudi/pull/6600)
|
| Mutable, Transactional caching for Hudi Tables (could be accelerated based
on community feedback) | 2.0.0 | [Strawman
design](https://docs.google.com/presentation/d/1QBgLw11TM2Qf1KUESofGrQDb63EuggNCpPaxc82Kldo/edit#slide=id.gf7e0551254_0_5),
[HUDI-6489](https://issues.apache.org/jira/browse/HUDI-6489) |
@@ -88,5 +79,4 @@ down by areas on our [stack](/docs/hudi_stack).
## Developer Experience
| Feature | Target Release |
Tracking |
|---------------------------------------------------------|----------------|------------------------------------------|
-| Support code coverage report and improve test coverage | 1.1.0 |
[HUDI-9015](https://issues.apache.org/jira/browse/HUDI-9015) |
-| Clean up tech debt and deprecate unused code | 1.1.0 |
[HUDI-9054](https://issues.apache.org/jira/browse/HUDI-9054) |
+| Clean up tech debt and deprecate unused code | 1.2.0 |
[HUDI-9054](https://issues.apache.org/jira/browse/HUDI-9054) |
diff --git a/website/static/assets/images/hudi_stack/pluggable_tf.png
b/website/static/assets/images/hudi_stack/pluggable_tf.png
new file mode 100644
index 000000000000..58ddeb8cc1fc
Binary files /dev/null and
b/website/static/assets/images/hudi_stack/pluggable_tf.png differ
diff --git a/website/versioned_docs/version-1.1.0/hudi_stack.md
b/website/versioned_docs/version-1.1.0/hudi_stack.md
index 189c9840b727..67e2bdbb1b8f 100644
--- a/website/versioned_docs/version-1.1.0/hudi_stack.md
+++ b/website/versioned_docs/version-1.1.0/hudi_stack.md
@@ -68,7 +68,12 @@ all Base Files is required. Read more about the various
table types in Hudi [tab
## Pluggable Table format
-Starting with Hudi 1.1, Hudi introduces a pluggable table format framework
that extends Hudi's powerful storage engine capabilities beyond its native
format to other table formats like Apache Iceberg and Delta Lake. This
framework decouples Hudi's core capabilities—transaction management, indexing,
concurrency control, and table services—from the specific storage format used
for data files. Hudi provides native format support (configured via
`hoodie.table.format=native` by default), whil [...]
+Starting with Hudi 1.1, Hudi introduces a pluggable table format framework
that extends Hudi's powerful storage engine capabilities beyond its native
format to other table formats like Apache Iceberg and Delta Lake. This
framework decouples Hudi's core capabilities—transaction management, indexing,
concurrency control, and table services—from the specific storage format used
for data files.
+
+
+<p align = "center">Pluggable Table Format</p>
+
+Hudi provides native format support (configured via
`hoodie.table.format=native` by default), while [Apache XTable
(incubating)](https://xtable.apache.org/) supplies pluggable format adapters
for formats like Iceberg and Delta Lake. The framework enables organizations to
choose the right format for each use case while maintaining a unified
operational experience and leveraging Hudi's sophisticated storage engine
across all formats. For example, you can write high-frequency updates to a H
[...]
## Storage Engine