(hudi) branch asf-site updated: [SITE][MINOR] Add Matomo for site traffic and fix links in blogs (#12018)

bhavanisudha Fri, 27 Sep 2024 12:08:02 -0700

This is an automated email from the ASF dual-hosted git repository.

bhavanisudha pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new e60604f5506 [SITE][MINOR] Add Matomo for site traffic and fix links in 
blogs (#12018)
e60604f5506 is described below

commit e60604f550671c041d868cfa860e94669162450c
Author: Bhavani Sudha Saktheeswaran <[email protected]>
AuthorDate: Fri Sep 27 12:07:48 2024 -0700

    [SITE][MINOR] Add Matomo for site traffic and fix links in blogs (#12018)
---
 website/blog/2021-07-21-streaming-data-lake-platform.md               | 2 +-
 .../2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md | 4 ++--
 .../blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md | 4 ++--
 ...-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md | 2 +-
 website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md          | 2 +-
 website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md          | 2 +-
 website/blog/2024-07-31-hudi-file-formats.md                          | 2 +-
 website/docs/hudi_stack.md                                            | 4 ++--
 website/docs/metadata.md                                              | 2 +-
 website/docs/rollbacks.md                                             | 2 +-
 website/src/theme/Navbar/Content/index.js                             | 1 +
 website/versioned_docs/version-0.14.0/metadata.md                     | 2 +-
 website/versioned_docs/version-0.14.0/rollbacks.md                    | 2 +-
 website/versioned_docs/version-0.14.1/metadata.md                     | 2 +-
 website/versioned_docs/version-0.14.1/rollbacks.md                    | 2 +-
 website/versioned_docs/version-0.15.0/hudi_stack.md                   | 4 ++--
 website/versioned_docs/version-0.15.0/metadata.md                     | 2 +-
 website/versioned_docs/version-0.15.0/rollbacks.md                    | 2 +-
 18 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/website/blog/2021-07-21-streaming-data-lake-platform.md 
b/website/blog/2021-07-21-streaming-data-lake-platform.md
index b81f4fca028..b59870323e3 100644
--- a/website/blog/2021-07-21-streaming-data-lake-platform.md
+++ b/website/blog/2021-07-21-streaming-data-lake-platform.md
@@ -45,7 +45,7 @@ Thus, the best way to describe Apache Hudi is as a 
**Streaming Data Lake Platfor
 
 **Streaming**: At its core, by optimizing for fast upserts & change streams, 
Hudi provides the primitives to data lake workloads that are comparable to what 
[Apache Kafka](https://kafka.apache.org/) does for event-streaming (namely, 
incremental produce/consume of events and a state-store for interactive 
querying).
 
-**Data Lake**: Nonetheless, Hudi provides an optimized, self-managing data 
plane for large scale data processing on the lake (adhoc queries, ML pipelines, 
batch pipelines), powering arguably the [largest transactional 
lake](https://eng.uber.com/apache-hudi-graduation/) in the world. While Hudi 
can be used to build a 
[lakehouse](https://databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html),
 given its transactional capabilities, Hudi goes beyond and unlocks an 
end-to-end streaming  [...]
+**Data Lake**: Nonetheless, Hudi provides an optimized, self-managing data 
plane for large scale data processing on the lake (adhoc queries, ML pipelines, 
batch pipelines), powering arguably the [largest transactional 
lake](https://eng.uber.com/apache-hudi-graduation/) in the world. While Hudi 
can be used to build a 
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/), 
given its transactional capabilities, Hudi goes beyond and unlocks an 
end-to-end streaming arc [...]
 
 **Platform**: Oftentimes in open source, there is great tech, but there is 
just too many of them - all differing ever so slightly in their opinionated 
ways, ultimately making the integration task onerous on the end user. Lake 
users deserve the same great usability that cloud warehouses provide, with the 
additional freedom and transparency of a true open source community. Hudi’s 
data and table services, tightly integrated with the Hudi “kernel”, gives us 
the ability to deliver cross layer [...]
 
diff --git 
a/website/blog/2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md
 
b/website/blog/2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md
index 2d90dea745b..6af7a1ecf2e 100644
--- 
a/website/blog/2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md
+++ 
b/website/blog/2021-12-16-lakehouse-concurrency-control-are-we-too-optimistic.md
@@ -10,7 +10,7 @@ tags:
 - apache hudi
 ---
 
-Transactions on data lakes are now considered a key characteristic of a 
Lakehouse these days. But what has actually been accomplished so far? What are 
the current approaches? How do they fare in real-world scenarios? These 
questions are the focus of this blog. 
+Transactions on data lakes are now considered a key characteristic of a 
[Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
these days. But what has actually been accomplished so far? What are the 
current approaches? How do they fare in real-world scenarios? These questions 
are the focus of this blog. 
 
 <!--truncate-->
 
@@ -54,4 +54,4 @@ All this said, there are still many ways we can improve upon 
this foundation.
 *   While optimistic concurrency control is attractive when serializable 
snapshot isolation is desired, it's neither optimal nor the only method for 
dealing with concurrency between writers. We plan to implement a fully 
lock-free concurrency control using CRDTs and widely adopted stream processing 
concepts, over our log [merge 
API](https://github.com/apache/hudi/blob/bc8bf043d5512f7afbb9d94882c4e43ee61d6f06/hudi-common/src/main/java/org/apache/hudi/common/model/HoodieRecordPayload.java#L
 [...]
 *   Touching upon key constraints, Hudi is the only lake transactional layer 
that ensures unique [key](https://hudi.apache.org/docs/key_generation) 
constraints today, but limited to the record key of the table. We will be 
looking to expand this capability in a more general form to non-primary key 
fields, with the said newer concurrency models.
 
-Finally, for data lakes to transform successfully into lakehouses, we must 
learn from the failing of the "hadoop warehouse" vision, which shared similar 
goals with the new "lakehouse" vision. Designers did not pay closer attention 
to the missing technology gaps against warehouses and created unrealistic 
expectations from the actual software. As transactions and database 
functionality finally goes mainstream on data lakes, we must apply these 
lessons and remain candid about the current sh [...]
\ No newline at end of file
+Finally, for data lakes to transform successfully into lakehouses, we must 
learn from the failing of the "hadoop warehouse" vision, which shared similar 
goals with the new 
"[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)"
 vision. Designers did not pay closer attention to the missing technology gaps 
against warehouses and created unrealistic expectations from the actual 
software. As transactions and database functionality finally goes mainstream on 
data lake [...]
\ No newline at end of file
diff --git 
a/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md 
b/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
index 60b13bba2ae..9b98fe700f5 100644
--- a/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
+++ b/website/blog/2022-07-11-build-open-lakehouse-using-apache-hudi-and-dbt.md
@@ -11,7 +11,7 @@ tags:
 - apache hudi
 ---
 
-The focus of this blog is to show you how to build an open lakehouse 
leveraging incremental data processing and performing field-level updates. We 
are excited to announce that you can now use Apache Hudi + dbt for building 
open data lakehouses.
+The focus of this blog is to show you how to build an open lakehouse 
leveraging incremental data processing and performing field-level updates. We 
are excited to announce that you can now use Apache Hudi + dbt for building 
open [data 
lakehouses](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).
 
 
![/assets/images/blog/hudi_dbt_lakehouse.png](/assets/images/blog/hudi_dbt_lakehouse.png)
 
@@ -20,7 +20,7 @@ Let's first clarify a few terminologies used in this blog 
before we dive into th
 
 ## What is Apache Hudi?
 
-Apache Hudi brings ACID transactions, record-level updates/deletes, and change 
streams to data lakehouses.
+Apache Hudi brings ACID transactions, record-level updates/deletes, and change 
streams to [data 
lakehouses](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/).
 
 Apache Hudi is an open-source data management framework used to simplify 
incremental data processing and data pipeline development. This framework more 
efficiently manages business requirements like data lifecycle and improves data 
quality.
 
diff --git 
a/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md
 
b/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md
index 84e4eb6f8c0..b4f7b2c5ec6 100644
--- 
a/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md
+++ 
b/website/blog/2022-12-19-Build-Your-First-Hudi-Lakehouse-with-AWS-Glue-and-AWS-S3.md
@@ -17,7 +17,7 @@ tags:
 
 # Build Your First Hudi Lakehouse with AWS S3 and AWS Glue
 
-Soumil Shah is a Hudi community champion building [YouTube 
content](https://www.youtube.com/@SoumilShah/playlists) so developers can 
easily get started incorporating a lakehouse into their data infrastructure. In 
this 
[video](https://www.youtube.com/watch?v=5zF4jc_3rFs&list=PLL2hlSFBmWwwbMpcyMjYuRn8cN99gFSY6),
 Soumil shows you how to get started with AWS Glue, AWS S3, Hudi and Athena.
+Soumil Shah is a Hudi community champion building [YouTube 
content](https://www.youtube.com/@SoumilShah/playlists) so developers can 
easily get started incorporating a 
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
into their data infrastructure. In this 
[video](https://www.youtube.com/watch?v=5zF4jc_3rFs&list=PLL2hlSFBmWwwbMpcyMjYuRn8cN99gFSY6),
 Soumil shows you how to get started with AWS Glue, AWS S3, Hudi and Athena.
 
 In this tutorial, you’ll learn how to:
 -   Create and configure AWS Glue
diff --git a/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md 
b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
index 9ecac331ef5..b44490ab1b9 100644
--- a/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
+++ b/website/blog/2022-12-29-Apache-Hudi-2022-A-Year-In-Review.md
@@ -28,7 +28,7 @@ where people are sharing and helping each other!
 
 While there are too many features added in 2022 to list them all, take a look 
at some of the exciting highlights:
 
-- [Multi-Modal 
Index](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
 is a first-of-its-kind high-performance indexing subsystem for the Lakehouse. 
It improves metadata lookup performance by up to 100x and reduces overall query 
latency by up to 30x. Two new indices were added to the metadata table - Bloom 
filter index that enables faster upsert performance and[  column stats index 
along with Data skipping](https://hudi.apache.org/bl [...]
+- [Multi-Modal 
Index](https://hudi.apache.org/blog/2022/05/17/Introducing-Multi-Modal-Index-for-the-Lakehouse-in-Apache-Hudi)
 is a first-of-its-kind high-performance indexing subsystem for the 
[Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/). 
It improves metadata lookup performance by up to 100x and reduces overall query 
latency by up to 30x. Two new indices were added to the metadata table - Bloom 
filter index that enables faster upsert performance and[  co [...]
 - Hudi added support for [asynchronous 
indexing](https://hudi.apache.org/releases/release-0.11.0/#async-indexer) to 
assist building such indices without blocking ingestion so that regular writers 
don't need to scale up resources for such one off spikes.
 - A new type of index called Bucket Index was introduced this year. This could 
be game changing for deterministic workloads with partitioned datasets. It is 
very light-weight and allows the distribution of records to buckets using a 
hash function.
 - Filesystem based Lock Provider - This implementation avoids the need of 
external systems and leverages the abilities of underlying filesystem to 
support lock provider needed for optimistic concurrency control in case of 
multiple writers. Please check the [lock 
configuration](https://hudi.apache.org/docs/configurations#Locks-Configurations)
 for details.
diff --git a/website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md 
b/website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md
index 64d9ba64efe..4921a45938c 100644
--- a/website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md
+++ b/website/blog/2023-12-28-apache-hudi-2023-a-year-in-review.md
@@ -115,7 +115,7 @@ as well as Flink 1.16, 1.17, and 1.18.
 While Apache Hudi continues its strong growth momentum, some members of the 
community also decided it is time to 
 start building interoperability bridges across Lakehouse table formats with 
Delta Lake and Iceberg. The 
 [recent announcement about OneTable becoming open 
source](https://www.onehouse.ai/blog/onetable-is-now-open-source)
-marks a big leap forward for all developers looking to build a data lakehouse 
architecture. This development not 
+marks a big leap forward for all developers looking to build a [data 
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
architecture. This development not 
 only emphasizes Hudi's commitment to openness but also enables a wider range 
of users to experience the 
 technological advantages offered by Hudi.
 
diff --git a/website/blog/2024-07-31-hudi-file-formats.md 
b/website/blog/2024-07-31-hudi-file-formats.md
index 1326b12bf3a..e57d13fa8c8 100644
--- a/website/blog/2024-07-31-hudi-file-formats.md
+++ b/website/blog/2024-07-31-hudi-file-formats.md
@@ -40,7 +40,7 @@ Cons of Parquet:
 * Small Data Sets: Parquet may not be the best choice for small datasets 
because the advantages of its columnar storage model aren’t as pronounced.
 
 Use Cases for Parquet:
-* Parquet is an excellent choice when dealing with large, complex, and nested 
data structures, especially for read-heavy workloads. Its columnar storage 
approach makes it an excellent choice for data lakehouse solutions where 
aggregation queries are common.
+* Parquet is an excellent choice when dealing with large, complex, and nested 
data structures, especially for read-heavy workloads. Its columnar storage 
approach makes it an excellent choice for [data 
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
solutions where aggregation queries are common.
 
 ### Optimized Row Columnar (ORC)
 [Apache ORC](https://orc.apache.org/) is another popular file format that is 
self-describing, and type-aware columnar file format.
diff --git a/website/docs/hudi_stack.md b/website/docs/hudi_stack.md
index a3a7896c92e..203e8ce5947 100644
--- a/website/docs/hudi_stack.md
+++ b/website/docs/hudi_stack.md
@@ -7,7 +7,7 @@ toc_max_heading_level: 3
 last_modified_at:
 ---
 
-Apache Hudi is a Transactional Data Lakehouse Platform built around a database 
kernel. It brings core warehouse and database functionality directly to a data 
lake thereby providing a table-level abstraction over open file formats like 
Apache Parquet/ORC (more recently known as the lakehouse architecture) and 
enabling transactional capabilities such as updates/deletes. Hudi also 
incorporates essential table services that are tightly integrated with the 
database kernel. These services can  [...]
+Apache Hudi is a Transactional [Data 
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
Platform built around a database kernel. It brings core warehouse and database 
functionality directly to a data lake thereby providing a table-level 
abstraction over open file formats like Apache Parquet/ORC (more recently known 
as the lakehouse architecture) and enabling transactional capabilities such as 
updates/deletes. Hudi also incorporates essential table services that [...]
 
 In this section, we will explore the Hudi stack and deconstruct the layers of 
software components that constitute Hudi. The features marked with an asterisk 
(*) represent work in progress, and the dotted boxes indicate planned future 
work. These components collectively aim to fulfill the 
[vision](https://github.com/apache/hudi/blob/master/rfc/rfc-69/rfc-69.md) for 
the project. 
 
@@ -24,7 +24,7 @@ The storage layer is where the data files (such as Parquet) 
are stored. Hudi int
 File formats hold the raw data and are physically stored on the lake storage. 
Hudi operates on logical structures of File Groups and File Slices, which 
consist of Base File and Log Files. Base Files are compacted and optimized for 
reads and are augmented with Log Files for efficient append. Future updates aim 
to integrate diverse formats like unstructured data (e.g., JSON, images), and 
compatibility with different storage layers in event-streaming, OLAP engines, 
and warehouses. Hudi's la [...]
 
 ## Transactional Database Layer
-The transactional database layer of Hudi comprises the core components that 
are responsible for the fundamental operations and services that enable Hudi to 
store, retrieve, and manage data efficiently on data lakehouse storages.
+The transactional database layer of Hudi comprises the core components that 
are responsible for the fundamental operations and services that enable Hudi to 
store, retrieve, and manage data efficiently on [data 
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
storages.
 
 ### Table Format
 ![Table Format](/assets/images/blog/hudistack/table_format_1.png)
diff --git a/website/docs/metadata.md b/website/docs/metadata.md
index 0b24b5ed550..ac8d1e1294a 100644
--- a/website/docs/metadata.md
+++ b/website/docs/metadata.md
@@ -9,7 +9,7 @@ Database indices contain auxiliary data structures to quickly 
locate records nee
 from storage. Given that Hudi’s design has been heavily optimized for handling 
mutable change streams, with different 
 write patterns, Hudi considers [indexing](#indexing) as an integral part of 
its design and has uniquely supported 
 [indexing 
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
 from its inception, to speed 
-up upserts on the Data Lakehouse. While Hudi's indices has benefited writers 
for fast upserts and deletes, Hudi's metadata table 
+up upserts on the [Data 
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/). 
While Hudi's indices has benefited writers for fast upserts and deletes, Hudi's 
metadata table 
 aims to tap these benefits more generally for both the readers and writers. 
The metadata table implemented as a single 
 internal Hudi Merge-On-Read table hosts different types of indices containing 
table metadata and is designed to be
 serverless and independent of compute and query engines. This is similar to 
common practices in databases where metadata
diff --git a/website/docs/rollbacks.md b/website/docs/rollbacks.md
index c78b8f3b084..7b311742fdb 100644
--- a/website/docs/rollbacks.md
+++ b/website/docs/rollbacks.md
@@ -18,7 +18,7 @@ page presents insights on how "rollback" in Hudi can 
automatically clean up hand
 manual input from users.
 
 ### Handling partially failed commits
-Hudi has a lot of platformization built in so as to ease the 
operationalization of lakehouse tables. One such feature 
+Hudi has a lot of platformization built in so as to ease the 
operationalization of 
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
tables. One such feature 
 is the automatic cleanup of partially failed commits. Users don’t need to run 
any additional commands to clean up dirty 
 data or the data produced by failed commits. If you continue to write to hudi 
tables, one of your future commits will 
 take care of cleaning up older data that failed midway during a write/commit. 
We call this cleanup of a failed commit a 
diff --git a/website/src/theme/Navbar/Content/index.js 
b/website/src/theme/Navbar/Content/index.js
index bcf06a87933..4cf7a85141c 100644
--- a/website/src/theme/Navbar/Content/index.js
+++ b/website/src/theme/Navbar/Content/index.js
@@ -41,6 +41,7 @@ function NavbarContentLayout({left, right}) {
   return (
     <div className={clsx("navbar__inner", [styles.navbarInnerStyle])}>
       <img referrerpolicy="no-referrer-when-downgrade" 
src="https://static.scarf.sh/a.png?x-pxid=8f594acf-9b77-44fb-9475-3e82ead1910c"; 
width={0} height={0} alt=""/>
+      <img referrerpolicy="no-referrer-when-downgrade" 
src="https://analytics.apache.org/matomo.php?idsite=47&amp;rec=1"; width={0} 
height={0} alt="" />
       <div className="navbar__items">{left}</div>
       <div className="navbar__items navbar__items--right">{right}</div>
     </div>
diff --git a/website/versioned_docs/version-0.14.0/metadata.md 
b/website/versioned_docs/version-0.14.0/metadata.md
index c02663dff0a..48a7047409c 100644
--- a/website/versioned_docs/version-0.14.0/metadata.md
+++ b/website/versioned_docs/version-0.14.0/metadata.md
@@ -9,7 +9,7 @@ Database indices contain auxiliary data structures to quickly 
locate records nee
 from storage. Given that Hudi’s design has been heavily optimized for handling 
mutable change streams, with different 
 write patterns, Hudi considers [indexing](#indexing) as an integral part of 
its design and has uniquely supported 
 [indexing 
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
 from its inception, to speed 
-up upserts on the Data Lakehouse. While Hudi's indices has benefited writers 
for fast upserts and deletes, Hudi's metadata table 
+up upserts on the [Data 
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/). 
While Hudi's indices has benefited writers for fast upserts and deletes, Hudi's 
metadata table 
 aims to tap these benefits more generally for both the readers and writers. 
The metadata table implemented as a single 
 internal Hudi Merge-On-Read table hosts different types of indices containing 
table metadata and is designed to be
 serverless and independent of compute and query engines. This is similar to 
common practices in databases where metadata
diff --git a/website/versioned_docs/version-0.14.0/rollbacks.md 
b/website/versioned_docs/version-0.14.0/rollbacks.md
index 295ce70c0dc..85bd52fdbc6 100644
--- a/website/versioned_docs/version-0.14.0/rollbacks.md
+++ b/website/versioned_docs/version-0.14.0/rollbacks.md
@@ -18,7 +18,7 @@ page presents insights on how "rollback" in Hudi can 
automatically clean up hand
 manual input from users.
 
 ### Handling partially failed commits
-Hudi has a lot of platformization built in so as to ease the 
operationalization of lakehouse tables. One such feature 
+Hudi has a lot of platformization built in so as to ease the 
operationalization of 
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
tables. One such feature 
 is the automatic cleanup of partially failed commits. Users don’t need to run 
any additional commands to clean up dirty 
 data or the data produced by failed commits. If you continue to write to hudi 
tables, one of your future commits will 
 take care of cleaning up older data that failed midway during a write/commit. 
We call this cleanup of a failed commit a 
diff --git a/website/versioned_docs/version-0.14.1/metadata.md 
b/website/versioned_docs/version-0.14.1/metadata.md
index f0ee8b8d51e..52e4c788275 100644
--- a/website/versioned_docs/version-0.14.1/metadata.md
+++ b/website/versioned_docs/version-0.14.1/metadata.md
@@ -9,7 +9,7 @@ Database indices contain auxiliary data structures to quickly 
locate records nee
 from storage. Given that Hudi’s design has been heavily optimized for handling 
mutable change streams, with different 
 write patterns, Hudi considers [indexing](#indexing) as an integral part of 
its design and has uniquely supported 
 [indexing 
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
 from its inception, to speed 
-up upserts on the Data Lakehouse. While Hudi's indices has benefited writers 
for fast upserts and deletes, Hudi's metadata table 
+up upserts on the [Data 
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/). 
While Hudi's indices has benefited writers for fast upserts and deletes, Hudi's 
metadata table 
 aims to tap these benefits more generally for both the readers and writers. 
The metadata table implemented as a single 
 internal Hudi Merge-On-Read table hosts different types of indices containing 
table metadata and is designed to be
 serverless and independent of compute and query engines. This is similar to 
common practices in databases where metadata
diff --git a/website/versioned_docs/version-0.14.1/rollbacks.md 
b/website/versioned_docs/version-0.14.1/rollbacks.md
index 5a2ebf2a70b..005f9eb8f7c 100644
--- a/website/versioned_docs/version-0.14.1/rollbacks.md
+++ b/website/versioned_docs/version-0.14.1/rollbacks.md
@@ -18,7 +18,7 @@ page presents insights on how "rollback" in Hudi can 
automatically clean up hand
 manual input from users.
 
 ### Handling partially failed commits
-Hudi has a lot of platformization built in so as to ease the 
operationalization of lakehouse tables. One such feature 
+Hudi has a lot of platformization built in so as to ease the 
operationalization of 
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
tables. One such feature 
 is the automatic cleanup of partially failed commits. Users don’t need to run 
any additional commands to clean up dirty 
 data or the data produced by failed commits. If you continue to write to hudi 
tables, one of your future commits will 
 take care of cleaning up older data that failed midway during a write/commit. 
We call this cleanup of a failed commit a 
diff --git a/website/versioned_docs/version-0.15.0/hudi_stack.md 
b/website/versioned_docs/version-0.15.0/hudi_stack.md
index a3a7896c92e..203e8ce5947 100644
--- a/website/versioned_docs/version-0.15.0/hudi_stack.md
+++ b/website/versioned_docs/version-0.15.0/hudi_stack.md
@@ -7,7 +7,7 @@ toc_max_heading_level: 3
 last_modified_at:
 ---
 
-Apache Hudi is a Transactional Data Lakehouse Platform built around a database 
kernel. It brings core warehouse and database functionality directly to a data 
lake thereby providing a table-level abstraction over open file formats like 
Apache Parquet/ORC (more recently known as the lakehouse architecture) and 
enabling transactional capabilities such as updates/deletes. Hudi also 
incorporates essential table services that are tightly integrated with the 
database kernel. These services can  [...]
+Apache Hudi is a Transactional [Data 
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
Platform built around a database kernel. It brings core warehouse and database 
functionality directly to a data lake thereby providing a table-level 
abstraction over open file formats like Apache Parquet/ORC (more recently known 
as the lakehouse architecture) and enabling transactional capabilities such as 
updates/deletes. Hudi also incorporates essential table services that [...]
 
 In this section, we will explore the Hudi stack and deconstruct the layers of 
software components that constitute Hudi. The features marked with an asterisk 
(*) represent work in progress, and the dotted boxes indicate planned future 
work. These components collectively aim to fulfill the 
[vision](https://github.com/apache/hudi/blob/master/rfc/rfc-69/rfc-69.md) for 
the project. 
 
@@ -24,7 +24,7 @@ The storage layer is where the data files (such as Parquet) 
are stored. Hudi int
 File formats hold the raw data and are physically stored on the lake storage. 
Hudi operates on logical structures of File Groups and File Slices, which 
consist of Base File and Log Files. Base Files are compacted and optimized for 
reads and are augmented with Log Files for efficient append. Future updates aim 
to integrate diverse formats like unstructured data (e.g., JSON, images), and 
compatibility with different storage layers in event-streaming, OLAP engines, 
and warehouses. Hudi's la [...]
 
 ## Transactional Database Layer
-The transactional database layer of Hudi comprises the core components that 
are responsible for the fundamental operations and services that enable Hudi to 
store, retrieve, and manage data efficiently on data lakehouse storages.
+The transactional database layer of Hudi comprises the core components that 
are responsible for the fundamental operations and services that enable Hudi to 
store, retrieve, and manage data efficiently on [data 
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
storages.
 
 ### Table Format
 ![Table Format](/assets/images/blog/hudistack/table_format_1.png)
diff --git a/website/versioned_docs/version-0.15.0/metadata.md 
b/website/versioned_docs/version-0.15.0/metadata.md
index 323aa8ce048..b2b57e62f84 100644
--- a/website/versioned_docs/version-0.15.0/metadata.md
+++ b/website/versioned_docs/version-0.15.0/metadata.md
@@ -9,7 +9,7 @@ Database indices contain auxiliary data structures to quickly 
locate records nee
 from storage. Given that Hudi’s design has been heavily optimized for handling 
mutable change streams, with different 
 write patterns, Hudi considers [indexing](#indexing) as an integral part of 
its design and has uniquely supported 
 [indexing 
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
 from its inception, to speed 
-up upserts on the Data Lakehouse. While Hudi's indices has benefited writers 
for fast upserts and deletes, Hudi's metadata table 
+up upserts on the [Data 
Lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/). 
While Hudi's indices has benefited writers for fast upserts and deletes, Hudi's 
metadata table 
 aims to tap these benefits more generally for both the readers and writers. 
The metadata table implemented as a single 
 internal Hudi Merge-On-Read table hosts different types of indices containing 
table metadata and is designed to be
 serverless and independent of compute and query engines. This is similar to 
common practices in databases where metadata
diff --git a/website/versioned_docs/version-0.15.0/rollbacks.md 
b/website/versioned_docs/version-0.15.0/rollbacks.md
index c78b8f3b084..7b311742fdb 100644
--- a/website/versioned_docs/version-0.15.0/rollbacks.md
+++ b/website/versioned_docs/version-0.15.0/rollbacks.md
@@ -18,7 +18,7 @@ page presents insights on how "rollback" in Hudi can 
automatically clean up hand
 manual input from users.
 
 ### Handling partially failed commits
-Hudi has a lot of platformization built in so as to ease the 
operationalization of lakehouse tables. One such feature 
+Hudi has a lot of platformization built in so as to ease the 
operationalization of 
[lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/) 
tables. One such feature 
 is the automatic cleanup of partially failed commits. Users don’t need to run 
any additional commands to clean up dirty 
 data or the data produced by failed commits. If you continue to write to hudi 
tables, one of your future commits will 
 take care of cleaning up older data that failed midway during a write/commit. 
We call this cleanup of a failed commit a

(hudi) branch asf-site updated: [SITE][MINOR] Add Matomo for site traffic and fix links in blogs (#12018)

Reply via email to