This is an automated email from the ASF dual-hosted git repository.
bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new e60604f5506 [SITE][MINOR] Add Matomo for site traffic and fix links in
blogs (#12018)
e60604f5506 is described below
commit e60604f550671c041d868cfa860e94669162450c
Author: Bhavani Sudha Saktheeswaran <[email protected]>
AuthorDate: Fri Sep 27 12:07:48 2024 -0700
[SITE][MINOR] Add Matomo for site traffic and fix links in blogs (#12018)
---
website/blog/2021-07-21-streaming-data-lake-platform.md | 2 +-
.../2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md | 4 ++--
.../blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md | 4 ++--
...-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md | 2 +-
website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md | 2 +-
website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md | 2 +-
website/blog/2024-07-31-hudi-file-formats.md | 2 +-
website/docs/hudi_stack.md | 4 ++--
website/docs/metadata.md | 2 +-
website/docs/rollbacks.md | 2 +-
website/src/theme/Navbar/Content/index.js | 1 +
website/versioned_docs/version-0.14.0/metadata.md | 2 +-
website/versioned_docs/version-0.14.0/rollbacks.md | 2 +-
website/versioned_docs/version-0.14.1/metadata.md | 2 +-
website/versioned_docs/version-0.14.1/rollbacks.md | 2 +-
website/versioned_docs/version-0.15.0/hudi_stack.md | 4 ++--
website/versioned_docs/version-0.15.0/metadata.md | 2 +-
website/versioned_docs/version-0.15.0/rollbacks.md | 2 +-
18 files changed, 22 insertions(+), 21 deletions(-)
diff --git a/website/blog/2021-07-21-streaming-data-lake-platform.md
b/website/blog/2021-07-21-streaming-data-lake-platform.md
index b81f4fca028..b59870323e3 100644
--- a/website/blog/2021-07-21-streaming-data-lake-platform.md
+++ b/website/blog/2021-07-21-streaming-data-lake-platform.md
@@ -45,7 +45,7 @@ Thus, the best way to describe Apache Hudi is as a
**Streaming Data Lake Platfor
**Streaming**: At its core, by optimizing for fast upserts & change streams,
Hudi provides the primitives to data lake workloads that are comparable to what
[Apache Kafka](https://kafka.apache.org/) does for event-streaming (namely,
incremental produce/consume of events and a state-store for interactive
querying).
-**Data Lake**: Nonetheless, Hudi provides an optimized, self-managing data
plane for large scale data processing on the lake (adhoc queries, ML pipelines,
batch pipelines), powering arguably the [largest transactional
lake](https://eng.uber.com/apache-hudi-graduation/) in the world. While Hudi
can be used to build a
[lakehouse](https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html),
given its transactional capabilities, Hudi goes beyond and unlocks an
end-to-end streaming [...]
+**Data Lake**: Nonetheless, Hudi provides an optimized, self-managing data
plane for large scale data processing on the lake (adhoc queries, ML pipelines,
batch pipelines), powering arguably the [largest transactional
lake](https://eng.uber.com/apache-hudi-graduation/) in the world. While Hudi
can be used to build a
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/),
given its transactional capabilities, Hudi goes beyond and unlocks an
end-to-end streaming arc [...]
**Platform**: Oftentimes in open source, there is great tech, but there is
just too many of them - all differing ever so slightly in their opinionated
ways, ultimately making the integration task onerous on the end user. Lake
users deserve the same great usability that cloud warehouses provide, with the
additional freedom and transparency of a true open source community. Hudi’s
data and table services, tightly integrated with the Hudi “kernel”, gives us
the ability to deliver cross layer [...]
diff --git
a/website/blog/2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md
b/website/blog/2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md
index 2d90dea745b..6af7a1ecf2e 100644
---
a/website/blog/2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md
+++
b/website/blog/2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md
@@ -10,7 +10,7 @@ tags:
- apache hudi
---
-Transactions on data lakes are now considered a key characteristic of a
Lakehouse these days. But what has actually been accomplished so far? What are
the current approaches? How do they fare in real-world scenarios? These
questions are the focus of this blog.
+Transactions on data lakes are now considered a key characteristic of a
[Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
these days. But what has actually been accomplished so far? What are the
current approaches? How do they fare in real-world scenarios? These questions
are the focus of this blog.
<!--truncate-->
@@ -54,4 +54,4 @@ All this said, there are still many ways we can improve upon
this foundation.
* While optimistic concurrency control is attractive when serializable
snapshot isolation is desired, it's neither optimal nor the only method for
dealing with concurrency between writers. We plan to implement a fully
lock-free concurrency control using CRDTs and widely adopted stream processing
concepts, over our log [merge
API](https://github.com/apache/hudi/blob/bc8bf043d5512f7afbb9d94882c4e43ee61d6f06/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java#L
[...]
* Touching upon key constraints, Hudi is the only lake transactional layer
that ensures unique [key](https://hudi.apache.org/docs/key_generation)
constraints today, but limited to the record key of the table. We will be
looking to expand this capability in a more general form to non-primary key
fields, with the said newer concurrency models.
-Finally, for data lakes to transform successfully into lakehouses, we must
learn from the failing of the "hadoop warehouse" vision, which shared similar
goals with the new "lakehouse" vision. Designers did not pay closer attention
to the missing technology gaps against warehouses and created unrealistic
expectations from the actual software. As transactions and database
functionality finally goes mainstream on data lakes, we must apply these
lessons and remain candid about the current sh [...]
\ No newline at end of file
+Finally, for data lakes to transform successfully into lakehouses, we must
learn from the failing of the "hadoop warehouse" vision, which shared similar
goals with the new
"[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)"
vision. Designers did not pay closer attention to the missing technology gaps
against warehouses and created unrealistic expectations from the actual
software. As transactions and database functionality finally goes mainstream on
data lake [...]
\ No newline at end of file
diff --git
a/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
b/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
index 60b13bba2ae..9b98fe700f5 100644
--- a/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
+++ b/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
@@ -11,7 +11,7 @@ tags:
- apache hudi
---
-The focus of this blog is to show you how to build an open lakehouse
leveraging incremental data processing and performing field-level updates. We
are excited to announce that you can now use Apache Hudi + dbt for building
open data lakehouses.
+The focus of this blog is to show you how to build an open lakehouse
leveraging incremental data processing and performing field-level updates. We
are excited to announce that you can now use Apache Hudi + dbt for building
open [data
lakehouses](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).

@@ -20,7 +20,7 @@ Let's first clarify a few terminologies used in this blog
before we dive into th
## What is Apache Hudi?
-Apache Hudi brings ACID transactions, record-level updates/deletes, and change
streams to data lakehouses.
+Apache Hudi brings ACID transactions, record-level updates/deletes, and change
streams to [data
lakehouses](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).
Apache Hudi is an open-source data management framework used to simplify
incremental data processing and data pipeline development. This framework more
efficiently manages business requirements like data lifecycle and improves data
quality.
diff --git
a/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md
b/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md
index 84e4eb6f8c0..b4f7b2c5ec6 100644
---
a/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md
+++
b/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md
@@ -17,7 +17,7 @@ tags:
# Build Your First Hudi Lakehouse with AWS S3 and AWS Glue
-Soumil Shah is a Hudi community champion building [YouTube
content](https://www.youtube.com/@SoumilShah/playlists) so developers can
easily get started incorporating a lakehouse into their data infrastructure. In
this
[video](https://www.youtube.com/watch?v=5zF4jc_3rFs&list=PLL2hlSFBmWwwbMpcyMjYuRn8cN99gFSY6),
Soumil shows you how to get started with AWS Glue, AWS S3, Hudi and Athena.
+Soumil Shah is a Hudi community champion building [YouTube
content](https://www.youtube.com/@SoumilShah/playlists) so developers can
easily get started incorporating a
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
into their data infrastructure. In this
[video](https://www.youtube.com/watch?v=5zF4jc_3rFs&list=PLL2hlSFBmWwwbMpcyMjYuRn8cN99gFSY6),
Soumil shows you how to get started with AWS Glue, AWS S3, Hudi and Athena.
In this tutorial, you’ll learn how to:
- Create and configure AWS Glue
diff --git a/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
index 9ecac331ef5..b44490ab1b9 100644
--- a/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
+++ b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
@@ -28,7 +28,7 @@ where people are sharing and helping each other!
While there are too many features added in 2022 to list them all, take a look
at some of the exciting highlights:
-- [Multi-Modal
Index](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
is a first-of-its-kind high-performance indexing subsystem for the Lakehouse.
It improves metadata lookup performance by up to 100x and reduces overall query
latency by up to 30x. Two new indices were added to the metadata table - Bloom
filter index that enables faster upsert performance and[ column stats index
along with Data skipping](https://hudi.apache.org/bl [...]
+- [Multi-Modal
Index](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
is a first-of-its-kind high-performance indexing subsystem for the
[Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).
It improves metadata lookup performance by up to 100x and reduces overall query
latency by up to 30x. Two new indices were added to the metadata table - Bloom
filter index that enables faster upsert performance and[ co [...]
- Hudi added support for [asynchronous
indexing](https://hudi.apache.org/releases/release-0.11.0/#async-indexer) to
assist building such indices without blocking ingestion so that regular writers
don't need to scale up resources for such one off spikes.
- A new type of index called Bucket Index was introduced this year. This could
be game changing for deterministic workloads with partitioned datasets. It is
very light-weight and allows the distribution of records to buckets using a
hash function.
- Filesystem based Lock Provider - This implementation avoids the need of
external systems and leverages the abilities of underlying filesystem to
support lock provider needed for optimistic concurrency control in case of
multiple writers. Please check the [lock
configuration](https://hudi.apache.org/docs/configurations#Locks-Configurations)
for details.
diff --git a/website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md
b/website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md
index 64d9ba64efe..4921a45938c 100644
--- a/website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md
+++ b/website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md
@@ -115,7 +115,7 @@ as well as Flink 1.16, 1.17, and 1.18.
While Apache Hudi continues its strong growth momentum, some members of the
community also decided it is time to
start building interoperability bridges across Lakehouse table formats with
Delta Lake and Iceberg. The
[recent announcement about OneTable becoming open
source](https://www.onehouse.ai/blog/onetable-is-now-open-source)
-marks a big leap forward for all developers looking to build a data lakehouse
architecture. This development not
+marks a big leap forward for all developers looking to build a [data
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
architecture. This development not
only emphasizes Hudi's commitment to openness but also enables a wider range
of users to experience the
technological advantages offered by Hudi.
diff --git a/website/blog/2024-07-31-hudi-file-formats.md
b/website/blog/2024-07-31-hudi-file-formats.md
index 1326b12bf3a..e57d13fa8c8 100644
--- a/website/blog/2024-07-31-hudi-file-formats.md
+++ b/website/blog/2024-07-31-hudi-file-formats.md
@@ -40,7 +40,7 @@ Cons of Parquet:
* Small Data Sets: Parquet may not be the best choice for small datasets
because the advantages of its columnar storage model aren’t as pronounced.
Use Cases for Parquet:
-* Parquet is an excellent choice when dealing with large, complex, and nested
data structures, especially for read-heavy workloads. Its columnar storage
approach makes it an excellent choice for data lakehouse solutions where
aggregation queries are common.
+* Parquet is an excellent choice when dealing with large, complex, and nested
data structures, especially for read-heavy workloads. Its columnar storage
approach makes it an excellent choice for [data
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
solutions where aggregation queries are common.
### Optimized Row Columnar (ORC)
[Apache ORC](https://orc.apache.org/) is another popular file format that is
self-describing, and type-aware columnar file format.
diff --git a/website/docs/hudi_stack.md b/website/docs/hudi_stack.md
index a3a7896c92e..203e8ce5947 100644
--- a/website/docs/hudi_stack.md
+++ b/website/docs/hudi_stack.md
@@ -7,7 +7,7 @@ toc_max_heading_level: 3
last_modified_at:
---
-Apache Hudi is a Transactional Data Lakehouse Platform built around a database
kernel. It brings core warehouse and database functionality directly to a data
lake thereby providing a table-level abstraction over open file formats like
Apache Parquet/ORC (more recently known as the lakehouse architecture) and
enabling transactional capabilities such as updates/deletes. Hudi also
incorporates essential table services that are tightly integrated with the
database kernel. These services can [...]
+Apache Hudi is a Transactional [Data
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
Platform built around a database kernel. It brings core warehouse and database
functionality directly to a data lake thereby providing a table-level
abstraction over open file formats like Apache Parquet/ORC (more recently known
as the lakehouse architecture) and enabling transactional capabilities such as
updates/deletes. Hudi also incorporates essential table services that [...]
In this section, we will explore the Hudi stack and deconstruct the layers of
software components that constitute Hudi. The features marked with an asterisk
(*) represent work in progress, and the dotted boxes indicate planned future
work. These components collectively aim to fulfill the
[vision](https://github.com/apache/hudi/blob/master/rfc/rfc-69/rfc-69.md) for
the project.
@@ -24,7 +24,7 @@ The storage layer is where the data files (such as Parquet)
are stored. Hudi int
File formats hold the raw data and are physically stored on the lake storage.
Hudi operates on logical structures of File Groups and File Slices, which
consist of Base File and Log Files. Base Files are compacted and optimized for
reads and are augmented with Log Files for efficient append. Future updates aim
to integrate diverse formats like unstructured data (e.g., JSON, images), and
compatibility with different storage layers in event-streaming, OLAP engines,
and warehouses. Hudi's la [...]
## Transactional Database Layer
-The transactional database layer of Hudi comprises the core components that
are responsible for the fundamental operations and services that enable Hudi to
store, retrieve, and manage data efficiently on data lakehouse storages.
+The transactional database layer of Hudi comprises the core components that
are responsible for the fundamental operations and services that enable Hudi to
store, retrieve, and manage data efficiently on [data
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
storages.
### Table Format

diff --git a/website/docs/metadata.md b/website/docs/metadata.md
index 0b24b5ed550..ac8d1e1294a 100644
--- a/website/docs/metadata.md
+++ b/website/docs/metadata.md
@@ -9,7 +9,7 @@ Database indices contain auxiliary data structures to quickly
locate records nee
from storage. Given that Hudi’s design has been heavily optimized for handling
mutable change streams, with different
write patterns, Hudi considers [indexing](#indexing) as an integral part of
its design and has uniquely supported
[indexing
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
from its inception, to speed
-up upserts on the Data Lakehouse. While Hudi's indices has benefited writers
for fast upserts and deletes, Hudi's metadata table
+up upserts on the [Data
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).
While Hudi's indices has benefited writers for fast upserts and deletes, Hudi's
metadata table
aims to tap these benefits more generally for both the readers and writers.
The metadata table implemented as a single
internal Hudi Merge-On-Read table hosts different types of indices containing
table metadata and is designed to be
serverless and independent of compute and query engines. This is similar to
common practices in databases where metadata
diff --git a/website/docs/rollbacks.md b/website/docs/rollbacks.md
index c78b8f3b084..7b311742fdb 100644
--- a/website/docs/rollbacks.md
+++ b/website/docs/rollbacks.md
@@ -18,7 +18,7 @@ page presents insights on how "rollback" in Hudi can
automatically clean up hand
manual input from users.
### Handling partially failed commits
-Hudi has a lot of platformization built in so as to ease the
operationalization of lakehouse tables. One such feature
+Hudi has a lot of platformization built in so as to ease the
operationalization of
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
tables. One such feature
is the automatic cleanup of partially failed commits. Users don’t need to run
any additional commands to clean up dirty
data or the data produced by failed commits. If you continue to write to hudi
tables, one of your future commits will
take care of cleaning up older data that failed midway during a write/commit.
We call this cleanup of a failed commit a
diff --git a/website/src/theme/Navbar/Content/index.js
b/website/src/theme/Navbar/Content/index.js
index bcf06a87933..4cf7a85141c 100644
--- a/website/src/theme/Navbar/Content/index.js
+++ b/website/src/theme/Navbar/Content/index.js
@@ -41,6 +41,7 @@ function NavbarContentLayout({left, right}) {
return (
<div className={clsx("navbar__inner", [styles.navbarInnerStyle])}>
<img referrerpolicy="no-referrer-when-downgrade"
src="https://static.scarf.sh/a.png?x-pxid=8f594acf-9b77-44fb-9475-3e82ead1910c"
width={0} height={0} alt=""/>
+ <img referrerpolicy="no-referrer-when-downgrade"
src="https://analytics.apache.org/matomo.php?idsite=47&rec=1" width={0}
height={0} alt="" />
<div className="navbar__items">{left}</div>
<div className="navbar__items navbar__items--right">{right}</div>
</div>
diff --git a/website/versioned_docs/version-0.14.0/metadata.md
b/website/versioned_docs/version-0.14.0/metadata.md
index c02663dff0a..48a7047409c 100644
--- a/website/versioned_docs/version-0.14.0/metadata.md
+++ b/website/versioned_docs/version-0.14.0/metadata.md
@@ -9,7 +9,7 @@ Database indices contain auxiliary data structures to quickly
locate records nee
from storage. Given that Hudi’s design has been heavily optimized for handling
mutable change streams, with different
write patterns, Hudi considers [indexing](#indexing) as an integral part of
its design and has uniquely supported
[indexing
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
from its inception, to speed
-up upserts on the Data Lakehouse. While Hudi's indices has benefited writers
for fast upserts and deletes, Hudi's metadata table
+up upserts on the [Data
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).
While Hudi's indices has benefited writers for fast upserts and deletes, Hudi's
metadata table
aims to tap these benefits more generally for both the readers and writers.
The metadata table implemented as a single
internal Hudi Merge-On-Read table hosts different types of indices containing
table metadata and is designed to be
serverless and independent of compute and query engines. This is similar to
common practices in databases where metadata
diff --git a/website/versioned_docs/version-0.14.0/rollbacks.md
b/website/versioned_docs/version-0.14.0/rollbacks.md
index 295ce70c0dc..85bd52fdbc6 100644
--- a/website/versioned_docs/version-0.14.0/rollbacks.md
+++ b/website/versioned_docs/version-0.14.0/rollbacks.md
@@ -18,7 +18,7 @@ page presents insights on how "rollback" in Hudi can
automatically clean up hand
manual input from users.
### Handling partially failed commits
-Hudi has a lot of platformization built in so as to ease the
operationalization of lakehouse tables. One such feature
+Hudi has a lot of platformization built in so as to ease the
operationalization of
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
tables. One such feature
is the automatic cleanup of partially failed commits. Users don’t need to run
any additional commands to clean up dirty
data or the data produced by failed commits. If you continue to write to hudi
tables, one of your future commits will
take care of cleaning up older data that failed midway during a write/commit.
We call this cleanup of a failed commit a
diff --git a/website/versioned_docs/version-0.14.1/metadata.md
b/website/versioned_docs/version-0.14.1/metadata.md
index f0ee8b8d51e..52e4c788275 100644
--- a/website/versioned_docs/version-0.14.1/metadata.md
+++ b/website/versioned_docs/version-0.14.1/metadata.md
@@ -9,7 +9,7 @@ Database indices contain auxiliary data structures to quickly
locate records nee
from storage. Given that Hudi’s design has been heavily optimized for handling
mutable change streams, with different
write patterns, Hudi considers [indexing](#indexing) as an integral part of
its design and has uniquely supported
[indexing
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
from its inception, to speed
-up upserts on the Data Lakehouse. While Hudi's indices has benefited writers
for fast upserts and deletes, Hudi's metadata table
+up upserts on the [Data
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).
While Hudi's indices has benefited writers for fast upserts and deletes, Hudi's
metadata table
aims to tap these benefits more generally for both the readers and writers.
The metadata table implemented as a single
internal Hudi Merge-On-Read table hosts different types of indices containing
table metadata and is designed to be
serverless and independent of compute and query engines. This is similar to
common practices in databases where metadata
diff --git a/website/versioned_docs/version-0.14.1/rollbacks.md
b/website/versioned_docs/version-0.14.1/rollbacks.md
index 5a2ebf2a70b..005f9eb8f7c 100644
--- a/website/versioned_docs/version-0.14.1/rollbacks.md
+++ b/website/versioned_docs/version-0.14.1/rollbacks.md
@@ -18,7 +18,7 @@ page presents insights on how "rollback" in Hudi can
automatically clean up hand
manual input from users.
### Handling partially failed commits
-Hudi has a lot of platformization built in so as to ease the
operationalization of lakehouse tables. One such feature
+Hudi has a lot of platformization built in so as to ease the
operationalization of
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
tables. One such feature
is the automatic cleanup of partially failed commits. Users don’t need to run
any additional commands to clean up dirty
data or the data produced by failed commits. If you continue to write to hudi
tables, one of your future commits will
take care of cleaning up older data that failed midway during a write/commit.
We call this cleanup of a failed commit a
diff --git a/website/versioned_docs/version-0.15.0/hudi_stack.md
b/website/versioned_docs/version-0.15.0/hudi_stack.md
index a3a7896c92e..203e8ce5947 100644
--- a/website/versioned_docs/version-0.15.0/hudi_stack.md
+++ b/website/versioned_docs/version-0.15.0/hudi_stack.md
@@ -7,7 +7,7 @@ toc_max_heading_level: 3
last_modified_at:
---
-Apache Hudi is a Transactional Data Lakehouse Platform built around a database
kernel. It brings core warehouse and database functionality directly to a data
lake thereby providing a table-level abstraction over open file formats like
Apache Parquet/ORC (more recently known as the lakehouse architecture) and
enabling transactional capabilities such as updates/deletes. Hudi also
incorporates essential table services that are tightly integrated with the
database kernel. These services can [...]
+Apache Hudi is a Transactional [Data
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
Platform built around a database kernel. It brings core warehouse and database
functionality directly to a data lake thereby providing a table-level
abstraction over open file formats like Apache Parquet/ORC (more recently known
as the lakehouse architecture) and enabling transactional capabilities such as
updates/deletes. Hudi also incorporates essential table services that [...]
In this section, we will explore the Hudi stack and deconstruct the layers of
software components that constitute Hudi. The features marked with an asterisk
(*) represent work in progress, and the dotted boxes indicate planned future
work. These components collectively aim to fulfill the
[vision](https://github.com/apache/hudi/blob/master/rfc/rfc-69/rfc-69.md) for
the project.
@@ -24,7 +24,7 @@ The storage layer is where the data files (such as Parquet)
are stored. Hudi int
File formats hold the raw data and are physically stored on the lake storage.
Hudi operates on logical structures of File Groups and File Slices, which
consist of Base File and Log Files. Base Files are compacted and optimized for
reads and are augmented with Log Files for efficient append. Future updates aim
to integrate diverse formats like unstructured data (e.g., JSON, images), and
compatibility with different storage layers in event-streaming, OLAP engines,
and warehouses. Hudi's la [...]
## Transactional Database Layer
-The transactional database layer of Hudi comprises the core components that
are responsible for the fundamental operations and services that enable Hudi to
store, retrieve, and manage data efficiently on data lakehouse storages.
+The transactional database layer of Hudi comprises the core components that
are responsible for the fundamental operations and services that enable Hudi to
store, retrieve, and manage data efficiently on [data
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
storages.
### Table Format

diff --git a/website/versioned_docs/version-0.15.0/metadata.md
b/website/versioned_docs/version-0.15.0/metadata.md
index 323aa8ce048..b2b57e62f84 100644
--- a/website/versioned_docs/version-0.15.0/metadata.md
+++ b/website/versioned_docs/version-0.15.0/metadata.md
@@ -9,7 +9,7 @@ Database indices contain auxiliary data structures to quickly
locate records nee
from storage. Given that Hudi’s design has been heavily optimized for handling
mutable change streams, with different
write patterns, Hudi considers [indexing](#indexing) as an integral part of
its design and has uniquely supported
[indexing
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
from its inception, to speed
-up upserts on the Data Lakehouse. While Hudi's indices has benefited writers
for fast upserts and deletes, Hudi's metadata table
+up upserts on the [Data
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).
While Hudi's indices has benefited writers for fast upserts and deletes, Hudi's
metadata table
aims to tap these benefits more generally for both the readers and writers.
The metadata table implemented as a single
internal Hudi Merge-On-Read table hosts different types of indices containing
table metadata and is designed to be
serverless and independent of compute and query engines. This is similar to
common practices in databases where metadata
diff --git a/website/versioned_docs/version-0.15.0/rollbacks.md
b/website/versioned_docs/version-0.15.0/rollbacks.md
index c78b8f3b084..7b311742fdb 100644
--- a/website/versioned_docs/version-0.15.0/rollbacks.md
+++ b/website/versioned_docs/version-0.15.0/rollbacks.md
@@ -18,7 +18,7 @@ page presents insights on how "rollback" in Hudi can
automatically clean up hand
manual input from users.
### Handling partially failed commits
-Hudi has a lot of platformization built in so as to ease the
operationalization of lakehouse tables. One such feature
+Hudi has a lot of platformization built in so as to ease the
operationalization of
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
tables. One such feature
is the automatic cleanup of partially failed commits. Users don’t need to run
any additional commands to clean up dirty
data or the data produced by failed commits. If you continue to write to hudi
tables, one of your future commits will
take care of cleaning up older data that failed midway during a write/commit.
We call this cleanup of a failed commit a