(hudi) branch asf-site updated: [DOCS] Updating new use-cases and home page text (#12361)

vinoth Thu, 28 Nov 2024 06:26:09 -0800

This is an automated email from the ASF dual-hosted git repository.

vinoth pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new bf7f1ad4570 [DOCS] Updating new use-cases and home page text (#12361)
bf7f1ad4570 is described below

commit bf7f1ad4570680b45958d9a37d95e42fcf3973f4
Author: vinoth chandar <[email protected]>
AuthorDate: Thu Nov 28 09:25:14 2024 -0500

    [DOCS] Updating new use-cases and home page text (#12361)
    
    * [DOCS] Updating new use-cases and home page text
    
    * Fixing broken links to cleaning page
---
 website/docs/{hoodie_cleaner.md => cleaning.md}    |   0
 website/docs/file_layouts.md                       |   2 +-
 website/docs/file_sizing.md                        |   2 +-
 website/docs/metadata_indexing.md                  |   2 +-
 website/docs/overview.mdx                          |   2 +-
 website/docs/rollbacks.md                          |   2 +-
 website/docs/use_cases.md                          | 223 +++++++++------------
 website/docs/write_operations.md                   |   2 +-
 website/docusaurus.config.js                       |   8 +-
 website/sidebars.js                                |  32 +--
 website/src/components/DataLakes/index.js          |   9 +-
 website/src/components/HomepageFeatures/index.js   |  24 +--
 website/src/components/WhyHudi/index.js            |   8 +-
 website/src/pages/roadmap.md                       | 103 +++++-----
 .../versioned_docs/version-0.14.1/file_layouts.md  |   2 +-
 .../versioned_docs/version-0.14.1/file_sizing.md   |   2 +-
 website/versioned_docs/version-0.14.1/use_cases.md |   2 +-
 .../version-0.14.1/write_operations.md             |   2 +-
 .../versioned_docs/version-0.15.0/file_layouts.md  |   2 +-
 .../versioned_docs/version-0.15.0/file_sizing.md   |   2 +-
 website/versioned_docs/version-0.15.0/use_cases.md |   2 +-
 .../version-0.15.0/write_operations.md             |   2 +-
 22 files changed, 201 insertions(+), 234 deletions(-)

diff --git a/website/docs/hoodie_cleaner.md b/website/docs/cleaning.md
similarity index 100%
rename from website/docs/hoodie_cleaner.md
rename to website/docs/cleaning.md
diff --git a/website/docs/file_layouts.md b/website/docs/file_layouts.md
index 3cfb8a7d837..478130fbd7e 100644
--- a/website/docs/file_layouts.md
+++ b/website/docs/file_layouts.md
@@ -11,7 +11,7 @@ The following describes the general file layout structure for 
Apache Hudi. Pleas
 * Each slice contains a base file (*.parquet/*.orc) (defined by the config - 
[hoodie.table.base.file.format](https://hudi.apache.org/docs/next/configurations/#hoodietablebasefileformat)
 ) produced at a certain commit/compaction instant time, along with set of log 
files (*.log.*) that contain inserts/updates to the base file since the base 
file was produced. 
 
 Hudi adopts Multiversion Concurrency Control (MVCC), where 
[compaction](/docs/next/compaction) action merges logs and base files to 
produce new 
-file slices and [cleaning](/docs/next/hoodie_cleaner) action gets rid of 
unused/older file slices to reclaim space on the file system.
+file slices and [cleaning](/docs/next/cleaning) action gets rid of 
unused/older file slices to reclaim space on the file system.
 
 ![Partition On HDFS](/assets/images/MOR_new.png)
 
diff --git a/website/docs/file_sizing.md b/website/docs/file_sizing.md
index 157190005f3..c637a5a630c 100644
--- a/website/docs/file_sizing.md
+++ b/website/docs/file_sizing.md
@@ -148,7 +148,7 @@ while the clustering service runs.
 
 :::note
 Hudi always creates immutable files on storage. To be able to do auto-sizing 
or clustering, Hudi will always create a
-newer version of the smaller file, resulting in 2 versions of the same file. 
The [cleaner service](/docs/next/hoodie_cleaner)
+newer version of the smaller file, resulting in 2 versions of the same file. 
The [cleaner service](/docs/next/cleaning)
 will later kick in and delete the older version small file and keep the latest 
one.
 :::
 
diff --git a/website/docs/metadata_indexing.md 
b/website/docs/metadata_indexing.md
index 5b96ed07bd4..1e0a74781e3 100644
--- a/website/docs/metadata_indexing.md
+++ b/website/docs/metadata_indexing.md
@@ -1,5 +1,5 @@
 ---
-title: Metadata Indexing
+title: Metadata & Indexing
 summary: "In this page, we describe how to run metadata indexing 
asynchronously."
 toc: true
 last_modified_at:
diff --git a/website/docs/overview.mdx b/website/docs/overview.mdx
index df56c304b39..013ecc6dc4b 100644
--- a/website/docs/overview.mdx
+++ b/website/docs/overview.mdx
@@ -13,7 +13,7 @@ how to learn more to get started.
 
 ## What is Apache Hudi
 Apache Hudi (pronounced “hoodie”) is the next generation [streaming data lake 
platform](/blog/2021/07/21/streaming-data-lake-platform).
-Apache Hudi brings core warehouse and database functionality directly to a 
data lake. Hudi provides [tables](/docs/next/sql_ddl),
+ Hudi brings core warehouse and database functionality directly to a data 
lake. Hudi provides [tables](/docs/next/sql_ddl),
 [transactions](/docs/next/timeline), [efficient 
upserts/deletes](/docs/next/write_operations), [advanced 
indexes](/docs/next/indexing),
 [ingestion services](/docs/hoodie_streaming_ingestion), data 
[clustering](/docs/next/clustering)/[compaction](/docs/next/compaction) 
optimizations,
 and [concurrency](/docs/next/concurrency_control) all while keeping your data 
in open source file formats.
diff --git a/website/docs/rollbacks.md b/website/docs/rollbacks.md
index 7b311742fdb..794f27c6d68 100644
--- a/website/docs/rollbacks.md
+++ b/website/docs/rollbacks.md
@@ -1,5 +1,5 @@
 ---
-title: Rollback Mechanism
+title: Auto Rollbacks
 toc: true
 toc_min_heading_level: 2
 toc_max_heading_level: 4
diff --git a/website/docs/use_cases.md b/website/docs/use_cases.md
index 4efb3bc4736..6a2c9d43a43 100644
--- a/website/docs/use_cases.md
+++ b/website/docs/use_cases.md
@@ -6,133 +6,98 @@ toc: true
 last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 
-Apache Hudi provides the foundational features required to build a 
state-of-the-art Lakehouse. 
-The following are examples of use cases for why many choose to use Apache Hudi:
-
-## A Streaming Data Lake
-Apache Hudi is a Streaming Data Lake Platform that unlocks near real-time data 
ingestion and incremental processing pipelines with ease.
-This blog post outlines this use case in more depth - 
https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform
-
-### Near Real-Time Ingestion
-
-Ingesting data from OLTP sources like (event logs, databases, external 
sources) into a [Data Lake](http://martinfowler.com/bliki/DataLake.html) is a 
common problem,
-that is unfortunately solved in a piecemeal fashion, using a medley of 
ingestion tools. This "raw data" layer of the data lake often forms the bedrock 
on which
-more value is created.
-
-For RDBMS ingestion, Hudi provides __faster loads via Upserts__, as opposed 
costly & inefficient bulk loads. It's very common to use a change capture 
solution like
-[Debezium](http://debezium.io/) or [Kafka 
Connect](https://docs.confluent.io/platform/current/connect/index) or 
-[Sqoop Incremental 
Import](https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide#_incremental_imports)
 and apply them to an
-equivalent Hudi table on DFS. For NoSQL datastores like 
[Cassandra](http://cassandra.apache.org/) / 
[Voldemort](http://www.project-voldemort.com/voldemort/) / 
[HBase](https://hbase.apache.org/), 
-even moderately big installations store billions of rows. It goes without 
saying that __full bulk loads are simply infeasible__ and more efficient 
approaches 
-are needed if ingestion is to keep up with the typically high update volumes.
-
-Even for immutable data sources like [Kafka](https://kafka.apache.org), there 
is often a need to de-duplicate the incoming events against what's stored on 
DFS.
-Hudi achieves this by [employing 
indexes](http://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/) of 
different kinds, quickly and efficiently.
-
-All of this is seamlessly achieved by the Hudi Streamer tool, which is 
maintained in tight integration with rest of the code 
-and we are always trying to add more capture sources, to make this easier for 
the users. The tool also has a continuous mode, where it
-can self-manage clustering/compaction asynchronously, without blocking 
ingestion, significantly improving data freshness.
-
-### Incremental Processing Pipelines
-
-Data Lake ETL typically involves building a chain of tables derived from each 
other via DAGs expressed as workflows. Workflows often depend on new data being 
output by
-multiple upstream workflows and traditionally, availability of new data is 
indicated by a new DFS Folder/Hive Partition.
-Let's take a concrete example to illustrate this. An upstream workflow `U` can 
create a Hive partition for every hour, with data for that hour (event_time) at 
the end of each hour (processing_time), providing effective freshness of 1 hour.
-Then, a downstream workflow `D`, kicks off immediately after `U` finishes, and 
does its own processing for the next hour, increasing the effective latency to 
2 hours.
-
-The above paradigm simply ignores late arriving data i.e when 
`processing_time` and `event_time` drift apart.
-Unfortunately, in today's post-mobile & pre-IoT world, __late data from 
intermittently connected mobile devices & sensors are the norm, not an 
anomaly__.
-In such cases, the only remedy to guarantee correctness is to reprocess the 
last few hours worth of data, over and over again each hour,
-which can significantly hurt the efficiency across the entire ecosystem. For 
e.g; imagine reprocessing TBs worth of data every hour across hundreds of 
workflows.
-
-Hudi comes to the rescue again, by providing a way to consume new data 
(including late data) from an upstream Hudi table `HU` at a record granularity 
(not folders/partitions),
-apply the processing logic, and efficiently update/reconcile late data with a 
downstream Hudi table `HD`. Here, `HU` and `HD` can be continuously scheduled 
at a much more frequent schedule
-like 15 mins, and providing an end-end latency of 30 mins at `HD`.
-
-To achieve this, Hudi has embraced similar concepts from stream processing 
frameworks like [Spark 
Streaming](https://spark.apache.org/docs/latest/streaming-programming-guide#join-operations)
 , Pub/Sub systems like 
[Kafka](http://kafka.apache.org/documentation/#theconsumer)
-[Flink](https://flink.apache.org) or database replication technologies like 
[Oracle 
XStream](https://docs.oracle.com/cd/E11882_01/server.112/e16545/xstrm_cncpt.htm#XSTRM187).
-For the more curious, a more detailed explanation of the benefits of 
Incremental Processing can be found 
[here](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop)
-
-### Unified Batch and Streaming
-
-The world we live in is polarized - even on data analytics storage - into 
real-time and offline/batch storage. Typically, real-time 
[datamarts](https://en.wikipedia.org/wiki/Data_mart)
-are powered by specialized analytical stores such as [Druid](http://druid.io/) 
or [Memsql](http://www.memsql.com/) or [Clickhouse](https://clickhouse.tech/), 
fed by event buses like
-[Kafka](https://kafka.apache.org) or [Pulsar](https://pulsar.apache.org). This 
model is prohibitively expensive, unless a small fraction of your data lake data
-needs sub-second query responses such as system monitoring or interactive 
real-time analysis.
-
-The same data gets ingested into data lake storage much later (say every few 
hours or so) and then runs through batch ETL pipelines, with intolerable data 
freshness
-to do any kind of near-realtime analytics. On the other hand, the data lakes 
provide access to interactive SQL engines like Presto/SparkSQL, which can 
horizontally scale
-easily and provide return even more complex queries, within few seconds.
-
-By bringing streaming primitives to data lake storage, Hudi opens up new 
possibilities by being able to ingest data within few minutes and also author 
incremental data
-pipelines that are orders of magnitude faster than traditional batch 
processing. By bringing __data freshness to a few minutes__, Hudi can provide a 
much efficient alternative,
-for a large class of data applications, compared to real-time datamarts. Also, 
Hudi has no upfront server infrastructure investments
-and thus enables faster analytics on much fresher analytics, without 
increasing the operational overhead. This external 
[article](https://www.analyticsinsight.net/can-big-data-solutions-be-affordable/)
-further validates this newer model.
-
-## Cloud-Native Tables
-Apache Hudi makes it easy to define tables, manage schema, metadata, and bring 
SQL semantics to cloud file storage.
-Some may first hear about Hudi as an "open table format". While this is true, 
it is just one layer the full Hudi stack.
-The term “table format” is new and still means many things to many people. 
Drawing an analogy to file formats, a table 
-format simply consists of : the file layout of the table, table’s schema and 
metadata tracking changes to the table. 
-Hudi is not a table format alone, but it does implement one internally. 
-
-### Schema Management
-A key component of a table is the schema of that table. Apache Hudi provides 
flexibility to enforce schemas, but also allow 
-schema evolution to ensure pipeline resilience to changes. Hudi uses Avro 
schemas to store, manage and evolve a table’s 
-schema. Currently, Hudi enforces schema-on-write, which although stricter than 
schema-on-read, is adopted widely in the 
-stream processing world to ensure pipelines don't break from non backwards 
compatible changes.
-
-### ACID Transactions
-Along with a table, Apache Hudi brings ACID transactional guarantees to a data 
lake.
-Hudi ensures atomic writes, by way of publishing commits atomically to a 
[timeline](/docs/next/timeline), stamped with an 
-instant time that denotes the time at which the action 
-is deemed to have occurred. Unlike general purpose file version control, Hudi 
draws clear distinction between writer processes 
-(that issue user’s upserts/deletes), table services (that write data/metadata 
to optimize/perform bookkeeping) and readers 
-(that execute queries and read data). Hudi provides snapshot isolation between 
all three types of processes, meaning they 
-all operate on a consistent snapshot of the table. Hudi provides [optimistic 
concurrency 
control](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+22+%3A+Snapshot+Isolation+using+Optimistic+Concurrency+Control+for+multi-writers)
 
-(OCC) between writers, while providing lock-free, non-blocking MVCC based 
concurrency control between writers and 
-table-services and between different table services.
-
-Projects that solely rely on OCC deal with competing operations, by either 
implementing a lock or relying on atomic renames. 
-Such approaches are optimistic that real contention never happens and resort 
to failing one of the writer operations if 
-conflicts occur, which can cause significant resource wastage or operational 
overhead. Imagine a scenario of two writer 
-processes : an ingest writer job producing new data every 30 minutes and a 
deletion writer job that is enforcing GDPR 
-taking 2 hours to issue deletes. If there were to overlap on the same files 
(very likely to happen in real situations 
-with random deletes), the deletion job is almost guaranteed to starve and fail 
to commit each time, wasting tons of 
-cluster resources. Hudi takes a very different approach that we believe is 
more apt for lake transactions, which are 
-typically long-running. For e.g async compaction that can keep deleting 
records in the background without blocking the ingest job. 
-This is implemented via a file level, log based concurrency control protocol 
which orders actions based on their start instant times on the timeline.
-
-### Efficient Upserts and Deletes
-While ACID transactions opens the door for Upserts and Deletes, Hudi also 
unlocks special capabilities like clustering, 
-indexing, and z-ordering which allows users to optimize for efficiency in 
Deletions and Upserts. Specifically, users can 
-cluster older event log data based on user_id, such that, queries that 
evaluate candidates for data deletion can do so, while
-more recent partitions are optimized for query performance and clustered on 
say timestamp. 
-
-Hudi also offers efficient ways of dealing with large write amplification, 
resulting from random deletes based on user_id
-(or any secondary key), by way of the `Merge On Read` table types. Hudi's 
elegant log based concurrency control, ensures 
-that the ingestion/writing can continue happening, as a background compaction 
job amortizes the cost of rewriting data to enforce deletes.
-
-### Time-Travel
-Apache Hudi unlocks the ability to write time travel queries, which means you 
can query the previous state of the data. 
-This is particularly useful for a few use cases. 
-- Rollbacks - Easily revert back to a previous version of the table.
-- Debugging - Inspect previous versions of data to understand how it has 
changed over time.
-- Audit History - Have a trail of commits that helps you see how, who, and 
when altered the data over time.
-
-## Data Lake Performance Optimizations
-Apache Hudi offers several cutting edge services which help you achieve 
industry leading performance and significant 
-cost savings for your data lake.
-
-Some examples of the Apache Hudi services that make this performance 
optimization easy include: 
-
-- [Auto File Sizing](/docs/next/file_sizing) - to solve the "small files" 
problem.
-- [Clustering](/docs/next/clustering) - to co-locate data next to each other.
-- [Compaction](/docs/next/compaction) - to allow tuning of low latency 
ingestion and fast read queries. 
-- [Indexing](/docs/next/indexing) - for efficient upserts and deletes.
-- Multi-Dimensional Partitioning (Z-Ordering) - Traditional folder style 
partitioning on low-cardinality, while also 
-Z-Ordering data within files based on high-cardinality
-- Metadata Table - No more slow S3 file listings or throttling.
-- [Auto Cleaning](/docs/next/hoodie_cleaner) - Keeps your storage costs in 
check by automatically removing old versions of files.
+Apache Hudi is a powerful [data lakehouse 
platform](https://hudi.apache.org/blog/2021/07/21/streaming-data-lake-platform) 
that shines in a variety of use cases due to its high-performance design, rich 
feature set, and 
+unique strengths tailored to modern data engineering needs. This document 
explores its key use cases and differentiation, to help you understand when and 
why Hudi is an excellent choice for your data lakehouse.
+
+## Streaming/CDC data ingestion to Data Lakehouse
+
+Hudi excels at handling incremental data updates, making it a perfect fit for 
CDC pipelines which replicate frequent updates, inserts, and deletes from an 
upstream database
+ like MySQL or PostgresSQL to a downstream data lakehouse table. This "raw 
data" layer of the data lake often forms the foundation on which all subsequent 
data workloads 
+from BI to AI are built. Though ingesting data from OLTP sources like (event 
logs, databases, external sources) into a [Data 
Lake](http://martinfowler.com/bliki/DataLake.html) is an important problem,
+it is unfortunately often solved in a piecemeal fashion, using a medley of 
ingestion tools.
+
+### Why Hudi? 
+
+- Unique design choices like Merge-On-Read tables, record-level indexes and 
asynchronous compaction, approach theoretical optimality for absorbing changes 
to tables quickly and efficiently.
+- Built-in ingestion tools on [Spark](/docs/hoodie_streaming_ingestion), 
[Flink](/docs/ingestion_flink) and [Kafka 
Connect](/docs/ingestion_kafka_connect), that let you ingest data with a single 
command.
+- Support for incremental ingestion with automatic checkpoint management from 
streaming sources (Kafka, Pulsar, ...), Cloud storage (S3, GCS, ADLS, etc.) and 
even JDBC.
+- Support for widely used data formats (Protobuf, Avro, JSON), file formats 
(parquet, orc, avro, etc.) and change log formats like 
[Debezium](http://debezium.io/).
+- Even for scalable de-duplication for high-volume append-only streaming data, 
by employing bloom filter indexes and advanced data structures like interval 
trees for efficient range pruning.
+- Integration with popular schema registries, to automatically and safely 
evolve tables to new schemas on-the-fly as they change in the source system.
+- Hudi supports event time ordering and late data handling for streaming 
workloads using RecordPayload/RecordMerger APIs let you merge updates in the 
database LSN order, in addition to latest writer wins semantics. Without this 
capability, the table can go back in (event) time, if the input records are 
out-of-order/late-arriving (which will inevitably happen in real life).
+
+## Offloading from expensive Data Warehouses
+
+As organizations scale, traditional ETL operations and data storage in data 
warehouses become prohibitively expensive. Hudi offers an efficient way to 
migrate these workloads 
+to a data lakehouse, significantly reducing costs without compromising on 
performance. 
+
+### Why Hudi?
+
+ - Hudi lets you store data in your own cloud accounts or storage systems in 
open data formats, away from vendor lock-in and avoiding additional storage 
costs from vendors. This also lets you open up data to other compute engines, 
including a plethora of open-source query engines like Presto, Trino, Starrocks.
+- Tools like 
[hudi-dbt](https://docs.getdbt.com/reference/resource-configs/spark-configs#incremental-models)
 adapter plugin makes it easy to migrate existing SQL ETL pipelines over to 
Apache Spark SQL. Users can then take advantage fast/efficient write 
performance of Hudi to cut down cost of '_L_' in ETL pipelines.
+- Hudi's storage format is optimized to efficiently compute "diffs" between 
two points in time on a table, allowing large SQL joins to be re-written 
efficiently by eliminating costly scans of large fact tables. This cuts down 
cost of '_E_' in ETL pipelines. 
+- Additionally, Hudi offers a fully-fledged set of table services, that can 
automatically optimize, cluster, and compact data in the background, resulting 
in significant cost savings over using proprietary compute services from a data 
warehouse.
+- Hudi combined with a stream processing like Flink and Dynamic Tables, can 
help replace slow, expensive warehouse ETLs, while also dramatically improving 
data freshness.
+
+## High Performance Open Table Format
+
+Over the past couple of years, there is a growing trend with data warehouses 
to support reads/writes on top of an "open table format" layer. The Table 
Format consists of one or more open 
+file formats, metadata around how the files constitute the table and a 
protocol for concurrently reading/writing to such tables. Though Hudi offers 
more than such a table format layer, 
+it packs a powerful native open table format designed for high performance 
even on the largest tables in the world.
+
+### Why Hudi?
+
+- Hudi format stores metadata in both an event log (timeline) and snapshot 
representations (metadata table), allowing for minimal storage overhead for 
keeping lots of versions of table, while still offering fast access for 
planning snapshot queries. 
+- Metadata about the table is also stored in an indexed fashion, conducive to 
efficient query processing. For e.g. statistics about columns, partitions are 
stored using an SSTable like file format, to ensure only smaller amounts of 
metadata, relevant to columns part of a query are read.
+- Hudi is designed from ground up with an indexing component that improves 
write/query performance, at the cost of relatively small increased storage 
overhead. Various indexes like hash-based record indexes, bloom filter indexes 
are available, with more on the way.
+- When it comes to concurrency control (CC), Hudi judiciously treats writers, 
readers and table services maintaining the table as separate entities. This 
design enables Hudi helps achieve multi-version concurrency control (MVCC) 
between writer and compaction/indexing, that allows writers to safely write 
without getting blocked or retrying on conflicts which waste a lot of compute 
resources in other approaches.
+- Between two writers, Hudi uses Optimistic Concurrency Control (OCC) to 
provide serializability on write completion time (commit time ordering) and a 
novel non-blocking concurrency control (NBCC) with record merging based on 
event-time (event-time processing).
+- With these design choices and interoperability provided with [Apache 
XTable](https://xtable.apache.org/) to other table formats, Hudi tables are 
quite often the fastest backing tables for other table formats like Delta Lake 
or Apache Iceberg.
+
+## Open Data Platform
+
+Many organizations seek to build a data platform that is open, future-proof 
and extensible. This requires open-source components that provide data formats, 
APIs and data compute services, that can be mixed and matched 
+together to build out the platform. Such an open platform is also essential 
for organizations to take advantage of the latest technologies and tools, 
without being beholden to a single vendor's roadmap.
+
+### Why Hudi?
+
+- Hudi only operates on data in open data, file and table formats. Hudi is not 
locked to any particular data format or storage system.
+- While open data formats help, Hudi unlocks complete freedom by also 
providing open compute services for ingesting, optimizing, indexing and 
querying data. For e.g Hudi's writers come with 
+  a self-managing table service runtime that can maintain tables automatically 
in the background on each write. Often times, Hudi and your favorite open query 
engine is all 
+  you need to get an open data platform up and running.
+- Examples of open services that make performance optimization or management 
easy include: [auto file sizing](/docs/next/file_sizing) to solve the "small 
files" problem,
+  [clustering](/docs/next/clustering) to co-locate data next to each other, 
[compaction](/docs/next/compaction) to allow tuning of low latency ingestion + 
fast read queries, 
+  [indexing](/docs/next/indexing) - for faster writes/queries, 
Multi-Dimensional Partitioning (Z-Ordering), automatic cleanup of uncommitted 
data with marker mechanism, 
+  [auto cleaning](/docs/next/cleaning) to automatically removing old versions 
of files.
+- Hudi provides rich options for pre-sorting/loading data efficiently and then 
follow on with rich set of data clustering techniques to manage file sizes and 
data distribution within a table. In each case, Hudi provides high-degree of 
configurability in terms of when/how often these services are scheduled, 
planned and executed. For e.g. Hudi ships with a handful of common planning 
strategies for compaction and clustering.
+- Along with compatibility with other open table formats like [Apache 
Iceberg](https://iceberg.apache.org/)/[Delta Lake](https://delta.io/), and 
catalog sync services to various data catalogs, Hudi is one of the most open 
choices for your data foundation.
+
+
+## Efficient Data lakes with Incremental Processing
+
+Organizations spend close to 50% of their budgets on data pipelines, that 
transform and prepare data for consumption. As data volumes increase, so does 
the cost of running these pipelines.
+Hudi has a unique combination of features that make it a very efficient choice 
for data pipelines, by introducing a new paradigm for incremental processing of 
data. The current state-of-the-art 
+prescribes two completely different data stacks for data processing. Batch 
processing stack stores data as files/objects on or cloud storage, processed by 
engines such as Spark, Hive and so on. On the other hand, the 
+stream processing stack stores data as events in independent storage systems 
like Kafka, processed by engines such as Flink. Even as processing engines 
provide unified APIs for these two styles of data processing, 
+the underlying storage differences make it impossible to use one stack for the 
other. Hudi offers a unified data lakehouse stack that can be used for both 
batch and streaming processing models. 
+
+Hudi introduces "incremental processing" to bring stream processing model 
(i.e. processing only newly added or changed data every X seconds/minutes) on 
top of batch storage (i.e. data lakehouse built on open data formats
+on the cloud), combining the best of both worlds. Incremental processing 
requires the ability to write changes quickly into tables using indexes, while 
also making the data available for querying efficiently.
+Another requirement is to be able to efficiently compute the exact set of 
changes to a table between two points in time for pipelines to efficiently only 
process new data each run, without having to scan the entire table.
+For the more curious, a more detailed explanation of the benefits of 
_incremental processing_ can be found 
[here](https://www.oreilly.com/ideas/ubers-case-for-incremental-processing-on-hadoop).
+
+### Why Hudi?
+
+- By bringing streaming primitives to data lake storage, Hudi opens up new 
possibilities by being able to ingest/process data within few minutes and 
eliminate need for specialized real-time analytics systems.
+- Hudi groups records into file groups, with updates being tied to the same 
file group, limiting the amount of data scanned for the query i.e only log 
files within the same file group need to be scanned for a given base file
+- Hudi adds low-overhead record level metadata and supplemental logging of 
metadata to compute CDC streams, to track how a given changes/moves within the 
table, in the face of writes and background table services. For e.g. Hudi is 
able to preserve change history even if many small files are combined into 
another file due to clustering 
+  and does not have any dependency on how table snapshots are maintained. In 
snapshot based approaches to tracking metadata, expiring a single snapshot can 
lead to loss of change history.
+- Hudi can encode updates natively without being forced to turn them into 
deletes and inserts, which tends to continuously redistribute records randomly 
across files, reducing data skipping efficiency. Hudi associates a given delete 
or update to the original file group that the record was inserted to (or latest 
clustered to), which preserves the spatial locality of clustered data or 
temporal order in which record were inserted. As a result, queries that filter 
on time (e.g querying event [...]
+- Building on top of this, Hudi also supports partial update encoding for 
encoding partial updates efficiently into delta logs. For columnar data, this 
means write/merge costs are proportional to number of columns in a merge/update 
statement.
+- The idea with MoR is to reduce write costs/latencies, by writing delta logs 
(Hudi), positional delete files (iceberg). Hudi employs about 4 types of 
indexing to quickly locate the file that the updates records belong to. Formats 
relying on a scan of the table can quickly bottleneck on write performance. e.g 
updating 1GB into a 1TB table every 5-10 mins.
+- Hudi is the only lakehouse storage system that natively supports event time 
ordering and late data handling for streaming workloads where MoR is employed 
heavily. 
+
+
+
+
 
diff --git a/website/docs/write_operations.md b/website/docs/write_operations.md
index 04a7a8b63a8..21eeb774bc9 100644
--- a/website/docs/write_operations.md
+++ b/website/docs/write_operations.md
@@ -100,7 +100,7 @@ The following is an inside look on the Hudi write path and 
the sequence of event
    1. Now that the write is performed, we will go back and update the index.
 7. Commit
    1. Finally we commit all of these changes atomically. ([Post-commit 
callback](/docs/next/platform_services_post_commit_callback) can be configured.)
-8. [Clean](/docs/next/hoodie_cleaner) (if needed)
+8. [Clean](/docs/next/cleaning) (if needed)
    1. Following the commit, cleaning is invoked if needed.
 9. [Compaction](/docs/next/compaction)
    1. If you are using MOR tables, compaction will either run inline, or be 
scheduled asynchronously
diff --git a/website/docusaurus.config.js b/website/docusaurus.config.js
index 6014ffb873a..8e53ff0f916 100644
--- a/website/docusaurus.config.js
+++ b/website/docusaurus.config.js
@@ -31,8 +31,8 @@ module.exports = {
       suffix: " to data lakes!",
       content: [
         "transactions",
-        "record-level updates/deletes",
-        "change streams",
+        "row-level updates/deletes",
+        "CDC and indexes"
       ],
     },
     slackUrl: slackUrl,
@@ -128,6 +128,10 @@ module.exports = {
             from: ["/contribute/team"],
             to: "/community/team",
           },
+          {
+            from: ["/docs/next/hoodie_cleaner"],
+            to: "/docs/next/cleaning",
+          },
           {
             from: ["/docs/releases", "/docs/next/releases"],
             to: "/releases/release-0.15.0",
diff --git a/website/sidebars.js b/website/sidebars.js
index a3a43514312..56d3bbe05fd 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -68,40 +68,40 @@ module.exports = {
             type: 'category',
             label: 'Table Services',
             items: [
-                'migration_guide',
+                'cleaning',
                 'compaction',
                 'clustering',
                 'metadata_indexing',
-                'hoodie_cleaner',
                 'rollbacks',
                 'markers',
                 'file_sizing',
-                'disaster_recovery',
+                {
+                    type: 'category',
+                    label: 'Syncing to Catalogs',
+                    items: [
+                         'syncing_aws_glue_data_catalog',
+                         'syncing_datahub',
+                         'syncing_metastore',
+                         'gcp_bigquery',
+                         'syncing_xtable'
+                    ],
+                }
             ],
         },
         {
             type: 'category',
-            label: 'Platform Services',
+            label: 'Platform & Tools',
             items: [
                 'snapshot_exporter',
                 'precommit_validator',
                 'platform_services_post_commit_callback',
-                {
-                    type: 'category',
-                    label: 'Syncing to Catalogs',
-                    items: [
-                        'syncing_aws_glue_data_catalog',
-                        'syncing_datahub',
-                        'syncing_metastore',
-                        'gcp_bigquery',
-                        'syncing_xtable'
-                    ],
-                }
+                'disaster_recovery',
+                'migration_guide',
             ],
         },
         {
             type: 'category',
-            label: 'Operations',
+            label: 'Operating Hudi',
             items: [
                 'performance',
                 'deployment',
diff --git a/website/src/components/DataLakes/index.js 
b/website/src/components/DataLakes/index.js
index 44393f6aec8..a56890e566e 100644
--- a/website/src/components/DataLakes/index.js
+++ b/website/src/components/DataLakes/index.js
@@ -7,11 +7,10 @@ const DataLake = () => {
             <div className="container">
                 <Title primaryText="What is" secondaryText="Hudi" />
                 <div className="sub-title text--center text--semibold 
margin-bottom--md">
-                    Apache Hudi is a transactional data lake platform that
-                    brings database and data warehouse capabilities to the
-                    data lake. Hudi reimagines slow old-school batch data
-                    processing with a powerful new incremental processing
-                    framework for low latency minute-level analytics.
+                    Apache Hudi is an open data lakehouse platform, built on a 
high-performance open table format
+                    to bring database functionality to your data lakes. <br/>
+                    Hudi reimagines slow old-school batch data processing with 
a
+                    powerful new incremental processing framework for low 
latency minute-level analytics.
                 </div>
 
                 <img
diff --git a/website/src/components/HomepageFeatures/index.js 
b/website/src/components/HomepageFeatures/index.js
index a8b5eb70fa4..83ca35a8591 100644
--- a/website/src/components/HomepageFeatures/index.js
+++ b/website/src/components/HomepageFeatures/index.js
@@ -18,28 +18,28 @@ const HomepageFeatures = () => {
   const features = [
     {
       icon: MutabilitySupport,
-      title: "Mutability support for all data lake workloads",
+      title: "Mutability support for all workload shapes & sizes",
       description:
-        "Quickly update & delete data with Hudi’s fast, pluggable indexing. 
This includes streaming workloads, with full support for out-of-order data, 
bursty traffic & data deduplication.",
+        "Quickly update & delete data with fast, pluggable indexing. This 
includes database CDC and high-scale streaming data, with best-in-class support 
for out-of-order records, bursty traffic & data deduplication.",
       link: "/docs/indexing",
     },
     {
       icon: IncrementalProcessing,
-      title: "Improved efficiency by incrementally processing new data",
+      title: "Unlock 10x efficiency by incrementally processing new data",
       description:
-        "Replace old-school batch pipelines with incremental streaming on your 
data lake. Experience faster ingestion and lower processing times for 
analytical workloads.",
+        "Replace old-school batch pipelines with incremental streaming on your 
data lake. Experience faster ingestion and lower processing times for your data 
pipelines.",
       link: "/blog/2020/08/18/hudi-incremental-processing-on-data-lakes",
     },
     {
       icon: ACIDTransactions,
-      title: "ACID Transactional guarantees to your data lake",
+      title: "ACID Transactional guarantees for your data lake",
       description:
-        "Bring transactional guarantees to your data lake, with consistent, 
atomic writes and concurrency controls tailored for longer-running lake 
transactions.",
+        "Atomic writes, with relational/streaming data consistency models, 
snapshot isolation and non-blocking concurrency controls tailored for 
longer-running lake transactions.",
       link: "/docs/use_cases/#acid-transactions",
     },
     {
       icon: HistoricalTimeTravel,
-      title: "Unlock historical data with time travel",
+      title: "Analyze historical data with time travel",
       description:
         "Query historical data with the ability to roll back to a table 
version; debug data versions to understand what changed over time; audit data 
changes by viewing the commit history.",
       link: "/docs/use_cases/#time-travel",
@@ -48,21 +48,21 @@ const HomepageFeatures = () => {
       icon: Interoperable,
       title: "Interoperable multi-cloud ecosystem support",
       description:
-        "Extensive ecosystem support with plug-and-play options for popular 
data sources & query engines. Build future-proof architectures interoperable 
with your vendor of choice.",
+        "Built on open data formats with extensive ecosystem support across 
cloud vendor ecosystem, with plug-and-play options for popular data sources & 
query engines.",
       link: "/docs/cloud",
     },
     {
       icon: TableServices,
-      title: "Comprehensive table services for high-performance analytics",
+      title: "Automatic table services for a high-performance lakehouse",
       description:
-        "Fully automated table services that continuously schedule & 
orchestrate clustering, compaction, cleaning, file sizing & indexing to ensure 
tables are always ready.",
+        "Fully automated table services that continuously schedule & 
orchestrate clustering, compaction, cleaning, file sizing & indexing to ensure 
tables are always optimized.",
       link: "/blog/2021/07/21/streaming-data-lake-platform/#table-services",
     },
     {
       icon: RichPlatform,
-      title: "A rich platform to build your lakehouse faster",
+      title: "Open Data Lakehouse platform to get you going faster",
       description:
-        "Effortlessly build your lakehouse with built-in tools for auto 
ingestion from services like Debezium and Kafka and auto catalog sync for easy 
discoverability & more.",
+        "Effortlessly build your lakehouse with built-in tools for auto 
ingestion from services like Debezium and Kafka and auto catalog sync to major 
cloud engines & more.",
       link: 
"/blog/2022/01/14/change-data-capture-with-debezium-and-apache-hudi",
     },
     {
diff --git a/website/src/components/WhyHudi/index.js 
b/website/src/components/WhyHudi/index.js
index 46bbb1ec76c..1c0341475ff 100644
--- a/website/src/components/WhyHudi/index.js
+++ b/website/src/components/WhyHudi/index.js
@@ -24,9 +24,9 @@ const WhyHudi = () => {
     },
     {
       icon: DerivedTablesIcon,
-      title: "Derived tables",
+      title: "High Performance",
       subtitle:
-        "Seamlessly create and manage SQL tables on your data lake to build 
multi-stage incremental pipelines.",
+        "Hudi's storage format is purpose-built to continuously deliver 
performance as data scales.",
     },
     {
       icon: DataStreamIcon,
@@ -45,9 +45,7 @@ const WhyHudi = () => {
           </div>
           <div className={styles.textWrapper}>
             <div className="text--center text--semibold">
-              Take advantage of Hudi’s platform with rich services and tools to
-              make your data lake actionable for applications like 
personalization,
-              machine learning, customer 360 and more!
+              The most innovative and completely open data lakehouse platform 
in the industry!
             </div>
           </div>
         </div>
diff --git a/website/src/pages/roadmap.md b/website/src/pages/roadmap.md
index 19dabef81ad..7fd850cf286 100644
--- a/website/src/pages/roadmap.md
+++ b/website/src/pages/roadmap.md
@@ -4,75 +4,76 @@ last_modified_at: 2019-12-30T15:59:57-04:00
 ---
 # Roadmap
 
-Hudi community strives to deliver major releases every 3-4 months, while 
offering minor releases every other month!
+Hudi community strives to deliver major releases every 3 months, while 
offering minor releases every 1-2 months!
 This page captures the forward-looking roadmap of ongoing & upcoming projects 
and when they are expected to land, broken
-down by areas on our 
[stack](blog/2021/07/21/streaming-data-lake-platform/#hudi-stack).
+down by areas on our [stack](/docs/hudi_stack).
 
 ## Recent Release
 [0.15.0](https://hudi.apache.org/releases/release-0.15.0) (June 2024)
 
 ## Future Releases
 
-| Release                                                                    | 
Timeline  |
-|----------------------------------------------------------------------------|-----------|
-| 1.0.0-beta2                                                                | 
July 2024 |
-| 0.16.0 (Bridge release supporting reads of both 1.x and 0.x Hudi versions) | 
Q3, 2024  |
-| 1.0.0                                                                      | 
Q3, 2024  |
-
-
-
-## Transactional Database Layer
-
-| Feature                                                        | Target 
Release | Tracking                                                              
                                                                                
                         |
-|----------------------------------------------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| 1.x Storage format                                             | 1.0.0       
   | [HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242)               
                                                                                
                    |
-| Writer performance improvements                                | 1.0.0       
   | [HUDI-3249](https://issues.apache.org/jira/browse/HUDI-3249)               
                                                                                
                    |
-| Non-blocking concurrency control                               | 1.0.0       
   | [HUDI-3187](https://issues.apache.org/jira/browse/HUDI-3187), 
[HUDI-1042](https://issues.apache.org/jira/browse/HUDI-1042), 
[RFC-66](https://github.com/apache/hudi/pull/7907) |
-| General purpose support for multi-table transactions           | 1.0.0       
   ||
-| Secondary indexes to improve query performance                 | 1.0.0       
   | [RFC-52](https://github.com/apache/hudi/pull/5370), 
[HUDI-3907](https://issues.apache.org/jira/browse/HUDI-3907)                    
                                           |
-| Index Function for Optimizing Query Performance                | 1.0.0       
   | [RFC-63](https://github.com/apache/hudi/pull/7235), 
[HUDI-512](https://issues.apache.org/jira/browse/HUDI-512)                      
                                           |
-| Logical partitioning via indexing                              | 1.0.0       
   | [HUDI-512](https://issues.apache.org/jira/browse/HUDI-512)                 
                                                                                
                    |
-| Track schema in metadata table                                 | 1.0.0       
   | [HUDI-6778](https://issues.apache.org/jira/browse/HUDI-6778)               
                                                                                
                    |
-| Storage partition stats index                                  | 1.0.0       
   | [HUDI-7144](https://issues.apache.org/jira/browse/HUDI-7144)               
                                                                                
                    |
-| Support update during clustering                               | 1.0.0       
   | [HUDI-1045](https://issues.apache.org/jira/browse/HUDI-1045)               
                                                                                
                    |
-| Time Travel updates, deletes                                   | 1.1.0       
   ||
-| A more effective HoodieMergeHandler for COW table with parquet | 1.1.0       
   | 
[RFC-68](https://github.com/apache/hudi/blob/f1afb1bf04abdc94a26d61dc302f36ec2bbeb15b/rfc/rfc-68/rfc-68.md)
                                                                    |
-| Streaming CDC/Incremental read improvement                     | 1.1.0       
   | [HUDI-2749](https://issues.apache.org/jira/browse/HUDI-2749)               
                                                                                
                    |
-| Supervised table service planning and execution                | 1.1.0       
   | [RFC-43](https://github.com/apache/hudi/pull/4309), 
[HUDI-4147](https://issues.apache.org/jira/browse/HUDI-4147)                    
                                           |
-| Enable partial updates for CDC work payload                    | 1.1.0       
   | [HUDI-7229](https://issues.apache.org/jira/browse/HUDI-7229)               
                                                                                
                    |
-
+| Release     | Timeline |
+|-------------|----------|
+| 1.0.0-beta2 | July 2024 |
+| 1.0.0 (GA)  | Q4, 2024 |
+ | 1.0.1       | Jan 2025 |
+| 1.1.0       | Mar 2025 |
+| 1.2.0       | May 2025 |
+| 1.3.0       | July 2025 |
+| 2.0.0       | Dec 2025 |
+
+
+## Transactions Database Layer
+
+| Feature                                              | Target Release | 
Tracking                                                                        
                                                                                
               |
+|------------------------------------------------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| 1.x Storage format                                   | 1.0.0          | 
[HUDI-6242](https://issues.apache.org/jira/browse/HUDI-6242)                    
                                                                                
               |
+| Writer performance improvements                      | 1.0.0          | 
[HUDI-3249](https://issues.apache.org/jira/browse/HUDI-3249)                    
                                                                                
               |
+| Non-blocking concurrency control                     | 1.0.0          | 
[HUDI-3187](https://issues.apache.org/jira/browse/HUDI-3187), 
[HUDI-1042](https://issues.apache.org/jira/browse/HUDI-1042), 
[RFC-66](https://github.com/apache/hudi/pull/7907) |
+| Secondary indexes to improve query performance       | 1.0.0          | 
[RFC-52](https://github.com/apache/hudi/pull/5370), 
[HUDI-3907](https://issues.apache.org/jira/browse/HUDI-3907)                    
                                           |
+| Index Function for Optimizing Query Performance      | 1.0.0          | 
[RFC-63](https://github.com/apache/hudi/pull/7235), 
[HUDI-512](https://issues.apache.org/jira/browse/HUDI-512)                      
                                           |
+| Logical partitioning via indexing                    | 1.0.0          | 
[HUDI-512](https://issues.apache.org/jira/browse/HUDI-512)                      
                                                                                
               |
+| Storage partition stats index                        | 1.0.0          | 
[HUDI-7144](https://issues.apache.org/jira/browse/HUDI-7144)                    
                                                                                
               |
+| Non-blocking updates during clustering               | 1.1.0          | 
[HUDI-1045](https://issues.apache.org/jira/browse/HUDI-1045)                    
                                                                                
               |
+| Track schema in metadata table                       | 1.1.0          | 
[HUDI-6778](https://issues.apache.org/jira/browse/HUDI-6778)                    
                                                                                
               |
+| Streaming CDC/Incremental read improvement           | 1.1.0          | 
[HUDI-2749](https://issues.apache.org/jira/browse/HUDI-2749)                    
                                                                                
               |
+| Supervised table service planning and execution      | 1.1.0          | 
[RFC-43](https://github.com/apache/hudi/pull/4309), 
[HUDI-4147](https://issues.apache.org/jira/browse/HUDI-4147)                    
                                           |
+| Enable partial updates for CDC workload payload      | 1.1.0          | 
[HUDI-7229](https://issues.apache.org/jira/browse/HUDI-7229)                    
                                                                                
               |
+| Vector search indexes                                | 1.1.0          |      
                                                                                
                           |
+| General purpose support for multi-table transactions | 1.2.0          ||
+| Time Travel updates, deletes                         | 1.3.0          ||
+| Unstructured data storage and management             | 1.3.0          ||
 
 
 ## Programming APIs
 
-| Feature                                                 | Target Release  | 
Tracking                                                                        
                                           |
-|---------------------------------------------------------|-----------------|----------------------------------------------------------------------------------------------------------------------------|
-| APIs/Abstractions, Record mergers                       | 1.0.0           | 
[HUDI-6243](https://issues.apache.org/jira/browse/HUDI-6243), 
[HUDI-3217](https://issues.apache.org/jira/browse/HUDI-3217) |
-| New Hudi Table Format APIs for Query Integrations       | 1.0.0           | 
[RFC-64](https://github.com/apache/hudi/pull/7080), 
[HUDI-4141](https://issues.apache.org/jira/browse/HUDI-4141)           |
-| Snapshot view management                                | 1.1.0           | 
[RFC-61](https://github.com/apache/hudi/pull/6576), 
[HUDI-4677](https://issues.apache.org/jira/browse/HUDI-4677)           |
-| Optimized storage layout for cloud object stores        | 1.1.0           | 
[RFC-60](https://github.com/apache/hudi/pull/5113), 
[HUDI-3625](https://issues.apache.org/jira/browse/HUDI-3625)           |
-| Support of verification with multiple event_time fields | 1.1.0           | 
[RFC-59](https://github.com/apache/hudi/pull/6382), 
[HUDI-4569](https://issues.apache.org/jira/browse/HUDI-4569)           |
+| Feature                                                 | Target Release | 
Tracking                                                                        
                                           |
+|---------------------------------------------------------|----------------|----------------------------------------------------------------------------------------------------------------------------|
+| APIs/Abstractions, Record mergers                       | 1.0.0          | 
[HUDI-6243](https://issues.apache.org/jira/browse/HUDI-6243), 
[HUDI-3217](https://issues.apache.org/jira/browse/HUDI-3217) |
+| New Hudi Table Format APIs for Query Integrations       | 1.1.0          | 
[RFC-64](https://github.com/apache/hudi/pull/7080), 
[HUDI-4141](https://issues.apache.org/jira/browse/HUDI-4141)           |
+| Snapshot view management                                | 1.2.0          | 
[RFC-61](https://github.com/apache/hudi/pull/6576), 
[HUDI-4677](https://issues.apache.org/jira/browse/HUDI-4677)           |
+| Support of verification with multiple event_time fields | 1.2.0          | 
[RFC-59](https://github.com/apache/hudi/pull/6382), 
[HUDI-4569](https://issues.apache.org/jira/browse/HUDI-4569)           |
 
 
-## Execution Engine Integration
+## Query Engine Integration
 
 | Feature                                                 | Target Release | 
Tracking                                                                        
                                                                                
                         |
 
|---------------------------------------------------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Presto/Trino queries with new format                    | 1.0.0          | 
[HUDI-3210](https://issues.apache.org/jira/browse/HUDI-4394), 
[HUDI-4394](https://issues.apache.org/jira/browse/HUDI-4394), 
[HUDI-4552](https://issues.apache.org/jira/browse/HUDI-4552) |
-| Table Valued Function to query Hudi timeline            | 1.0.0              
| [HUDI-7243](https://issues.apache.org/jira/browse/HUDI-7243)                  
                                                                                
                           |
-| Default Java 17 support                                     | 1.0.0          
| [HUDI-6506](https://issues.apache.org/jira/browse/HUDI-6506)                  
                                                                                
                           |
-| Spark 4 Support                                             | 1.0.0          
| [HUDI-7915](https://issues.apache.org/jira/browse/HUDI-7915)                  
                                                                                
                           |
-| Materialized Views with incremental updates using Flink | 1.1.0          ||
+| Presto/Trino queries with new format                    | 1.1.0          | 
[HUDI-3210](https://issues.apache.org/jira/browse/HUDI-4394), 
[HUDI-4394](https://issues.apache.org/jira/browse/HUDI-4394), 
[HUDI-4552](https://issues.apache.org/jira/browse/HUDI-4552) |
+| Table Valued Function to query Hudi timeline            | 1.0.0              
 | [HUDI-7243](https://issues.apache.org/jira/browse/HUDI-7243)                 
                                                                                
                            |
+| Default Java 17 support                                     | 1.1.0          
 | [HUDI-6506](https://issues.apache.org/jira/browse/HUDI-6506)                 
                                                                                
                            |
+| Spark 4 Support                                             | 1.1.0          
 | [HUDI-7915](https://issues.apache.org/jira/browse/HUDI-7915)                 
                                                                                
                            |
 | Spark datasource V2 read                                | 1.1.0          | 
[HUDI-4449](https://issues.apache.org/jira/browse/HUDI-4449)                    
                                                                                
                         |
-| Replace Dataframe write path for Spark                  | 1.1.0              
| [HUDI-4857](https://issues.apache.org/jira/browse/HUDI-4857)                  
                                                                                
                           |
+| Replace Dataframe write path for Spark                  | 1.1.0              
 | [HUDI-4857](https://issues.apache.org/jira/browse/HUDI-4857)                 
                                                                                
                            |
 
 
 ## Platform Services
 
-| Feature                                                                      
                       | Target Release | Tracking                              
                                                                                
                  |
-|-----------------------------------------------------------------------------------------------------|----------------|-----------------------------------------------------------------------------------------------------------------------------------------|
-| Hudi Reverse streamer                                                        
                       | 1.1.0          | 
[RFC-70](https://github.com/apache/hudi/pull/9040)                              
                                                         |
-| Diagnostic Reporter                                                          
                       | 1.1.0          | 
[RFC-62](https://github.com/apache/hudi/pull/6600)                              
                                           |
-| Hudi integration with Snowflake                                              
                       | 1.1.0          | 
[RFC-41](https://github.com/apache/hudi/pull/4074), 
[HUDI-2832](https://issues.apache.org/jira/browse/HUDI-2832)                    
    |
-| Support for reliable, event based ingestion from cloud stores - GCS, Azure 
and the others           | 1.1.0          | 
[HUDI-1896](https://issues.apache.org/jira/browse/HUDI-1896)                    
                                                        |
-| Mutable, Transactional caching for Hudi Tables (could be accelerated based 
on community feedback)   | 1.1.0          | [Strawman 
design](https://docs.google.com/presentation/d/1QBgLw11TM2Qf1KUESofGrQDb63EuggNCpPaxc82Kldo/edit#slide=id.gf7e0551254_0_5),
 [HUDI-6489](https://issues.apache.org/jira/browse/HUDI-6489)  |
+| Feature                                                                      
                     | Target Release | Tracking                                
                                                                                
               |
+|---------------------------------------------------------------------------------------------------|----------------|----------------------------------------------------------------------------------------------------------------------------------------|
+| Support for reliable, event based ingestion from cloud stores - GCS, Azure 
and the others         | 1.0.0          | 
[HUDI-1896](https://issues.apache.org/jira/browse/HUDI-1896)                    
                                                       |
+| Hudi Reverse streamer                                                        
                     | 1.2.0          | 
[RFC-70](https://github.com/apache/hudi/pull/9040)                              
                                                        |
+| Diagnostic Reporter                                                          
                     | 1.2.0          | 
[RFC-62](https://github.com/apache/hudi/pull/6600)                              
                                          |
+| Mutable, Transactional caching for Hudi Tables (could be accelerated based 
on community feedback) | 2.0.0          | [Strawman 
design](https://docs.google.com/presentation/d/1QBgLw11TM2Qf1KUESofGrQDb63EuggNCpPaxc82Kldo/edit#slide=id.gf7e0551254_0_5),
 [HUDI-6489](https://issues.apache.org/jira/browse/HUDI-6489) |
+| Hudi Metaserver (could be accelerated based on community feedback)           
                     | 2.0.0          |   |
diff --git a/website/versioned_docs/version-0.14.1/file_layouts.md 
b/website/versioned_docs/version-0.14.1/file_layouts.md
index 67af341bd69..71ee6d56307 100644
--- a/website/versioned_docs/version-0.14.1/file_layouts.md
+++ b/website/versioned_docs/version-0.14.1/file_layouts.md
@@ -11,7 +11,7 @@ The following describes the general file layout structure for 
Apache Hudi. Pleas
 * Each slice contains a base file (*.parquet/*.orc) (defined by the config - 
[hoodie.table.base.file.format](https://hudi.apache.org/docs/next/configurations/#hoodietablebasefileformat)
 ) produced at a certain commit/compaction instant time, along with set of log 
files (*.log.*) that contain inserts/updates to the base file since the base 
file was produced. 
 
 Hudi adopts Multiversion Concurrency Control (MVCC), where 
[compaction](/docs/next/compaction) action merges logs and base files to 
produce new 
-file slices and [cleaning](/docs/next/hoodie_cleaner) action gets rid of 
unused/older file slices to reclaim space on the file system.
+file slices and [cleaning](/docs/next/cleaning) action gets rid of 
unused/older file slices to reclaim space on the file system.
 
 ![Partition On HDFS](/assets/images/hudi_partitions_HDFS.png)
 
diff --git a/website/versioned_docs/version-0.14.1/file_sizing.md 
b/website/versioned_docs/version-0.14.1/file_sizing.md
index 157190005f3..c637a5a630c 100644
--- a/website/versioned_docs/version-0.14.1/file_sizing.md
+++ b/website/versioned_docs/version-0.14.1/file_sizing.md
@@ -148,7 +148,7 @@ while the clustering service runs.
 
 :::note
 Hudi always creates immutable files on storage. To be able to do auto-sizing 
or clustering, Hudi will always create a
-newer version of the smaller file, resulting in 2 versions of the same file. 
The [cleaner service](/docs/next/hoodie_cleaner)
+newer version of the smaller file, resulting in 2 versions of the same file. 
The [cleaner service](/docs/next/cleaning)
 will later kick in and delete the older version small file and keep the latest 
one.
 :::
 
diff --git a/website/versioned_docs/version-0.14.1/use_cases.md 
b/website/versioned_docs/version-0.14.1/use_cases.md
index 4efb3bc4736..1781ffdbcae 100644
--- a/website/versioned_docs/version-0.14.1/use_cases.md
+++ b/website/versioned_docs/version-0.14.1/use_cases.md
@@ -134,5 +134,5 @@ Some examples of the Apache Hudi services that make this 
performance optimizatio
 - Multi-Dimensional Partitioning (Z-Ordering) - Traditional folder style 
partitioning on low-cardinality, while also 
 Z-Ordering data within files based on high-cardinality
 - Metadata Table - No more slow S3 file listings or throttling.
-- [Auto Cleaning](/docs/next/hoodie_cleaner) - Keeps your storage costs in 
check by automatically removing old versions of files.
+- [Auto Cleaning](/docs/next/cleaning) - Keeps your storage costs in check by 
automatically removing old versions of files.
 
diff --git a/website/versioned_docs/version-0.14.1/write_operations.md 
b/website/versioned_docs/version-0.14.1/write_operations.md
index 90b87499fe0..cbf4304eab5 100644
--- a/website/versioned_docs/version-0.14.1/write_operations.md
+++ b/website/versioned_docs/version-0.14.1/write_operations.md
@@ -101,7 +101,7 @@ The following is an inside look on the Hudi write path and 
the sequence of event
    1. Now that the write is performed, we will go back and update the index.
 7. Commit
    1. Finally we commit all of these changes atomically. (A [callback 
notification](/docs/next/writing_data#commit-notifications) is exposed)
-8. [Clean](/docs/next/hoodie_cleaner) (if needed)
+8. [Clean](/docs/next/cleaning) (if needed)
    1. Following the commit, cleaning is invoked if needed.
 9. [Compaction](/docs/next/compaction)
    1. If you are using MOR tables, compaction will either run inline, or be 
scheduled asynchronously
diff --git a/website/versioned_docs/version-0.15.0/file_layouts.md 
b/website/versioned_docs/version-0.15.0/file_layouts.md
index 3cfb8a7d837..478130fbd7e 100644
--- a/website/versioned_docs/version-0.15.0/file_layouts.md
+++ b/website/versioned_docs/version-0.15.0/file_layouts.md
@@ -11,7 +11,7 @@ The following describes the general file layout structure for 
Apache Hudi. Pleas
 * Each slice contains a base file (*.parquet/*.orc) (defined by the config - 
[hoodie.table.base.file.format](https://hudi.apache.org/docs/next/configurations/#hoodietablebasefileformat)
 ) produced at a certain commit/compaction instant time, along with set of log 
files (*.log.*) that contain inserts/updates to the base file since the base 
file was produced. 
 
 Hudi adopts Multiversion Concurrency Control (MVCC), where 
[compaction](/docs/next/compaction) action merges logs and base files to 
produce new 
-file slices and [cleaning](/docs/next/hoodie_cleaner) action gets rid of 
unused/older file slices to reclaim space on the file system.
+file slices and [cleaning](/docs/next/cleaning) action gets rid of 
unused/older file slices to reclaim space on the file system.
 
 ![Partition On HDFS](/assets/images/MOR_new.png)
 
diff --git a/website/versioned_docs/version-0.15.0/file_sizing.md 
b/website/versioned_docs/version-0.15.0/file_sizing.md
index 157190005f3..c637a5a630c 100644
--- a/website/versioned_docs/version-0.15.0/file_sizing.md
+++ b/website/versioned_docs/version-0.15.0/file_sizing.md
@@ -148,7 +148,7 @@ while the clustering service runs.
 
 :::note
 Hudi always creates immutable files on storage. To be able to do auto-sizing 
or clustering, Hudi will always create a
-newer version of the smaller file, resulting in 2 versions of the same file. 
The [cleaner service](/docs/next/hoodie_cleaner)
+newer version of the smaller file, resulting in 2 versions of the same file. 
The [cleaner service](/docs/next/cleaning)
 will later kick in and delete the older version small file and keep the latest 
one.
 :::
 
diff --git a/website/versioned_docs/version-0.15.0/use_cases.md 
b/website/versioned_docs/version-0.15.0/use_cases.md
index 4efb3bc4736..1781ffdbcae 100644
--- a/website/versioned_docs/version-0.15.0/use_cases.md
+++ b/website/versioned_docs/version-0.15.0/use_cases.md
@@ -134,5 +134,5 @@ Some examples of the Apache Hudi services that make this 
performance optimizatio
 - Multi-Dimensional Partitioning (Z-Ordering) - Traditional folder style 
partitioning on low-cardinality, while also 
 Z-Ordering data within files based on high-cardinality
 - Metadata Table - No more slow S3 file listings or throttling.
-- [Auto Cleaning](/docs/next/hoodie_cleaner) - Keeps your storage costs in 
check by automatically removing old versions of files.
+- [Auto Cleaning](/docs/next/cleaning) - Keeps your storage costs in check by 
automatically removing old versions of files.
 
diff --git a/website/versioned_docs/version-0.15.0/write_operations.md 
b/website/versioned_docs/version-0.15.0/write_operations.md
index 04a7a8b63a8..21eeb774bc9 100644
--- a/website/versioned_docs/version-0.15.0/write_operations.md
+++ b/website/versioned_docs/version-0.15.0/write_operations.md
@@ -100,7 +100,7 @@ The following is an inside look on the Hudi write path and 
the sequence of event
    1. Now that the write is performed, we will go back and update the index.
 7. Commit
    1. Finally we commit all of these changes atomically. ([Post-commit 
callback](/docs/next/platform_services_post_commit_callback) can be configured.)
-8. [Clean](/docs/next/hoodie_cleaner) (if needed)
+8. [Clean](/docs/next/cleaning) (if needed)
    1. Following the commit, cleaning is invoked if needed.
 9. [Compaction](/docs/next/compaction)
    1. If you are using MOR tables, compaction will either run inline, or be 
scheduled asynchronously

(hudi) branch asf-site updated: [DOCS] Updating new use-cases and home page text (#12361)

Reply via email to