This is an automated email from the ASF dual-hosted git repository.
xushiyan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 16cce89c43ae docs: update hudi stack page with pluggable TF (#14330)
16cce89c43ae is described below
commit 16cce89c43aed514f34fd206d3f679101cbddcad
Author: Shiyan Xu <[email protected]>
AuthorDate: Sun Nov 23 22:31:06 2025 -0600
docs: update hudi stack page with pluggable TF (#14330)
---
website/docs/hudi_stack.md | 165 +++++++++++++-----------
website/static/assets/images/hudi-stack-1-x.png | Bin 427421 -> 162585 bytes
2 files changed, 92 insertions(+), 73 deletions(-)
diff --git a/website/docs/hudi_stack.md b/website/docs/hudi_stack.md
index 3218592d36f4..e36deb5c173b 100644
--- a/website/docs/hudi_stack.md
+++ b/website/docs/hudi_stack.md
@@ -1,5 +1,5 @@
---
-title: Apache Hudi Stack
+title: The Hudi Stack
summary: "Explains about the various layers of software components that make
up Hudi"
toc: true
toc_min_heading_level: 2
@@ -7,166 +7,185 @@ toc_max_heading_level: 3
last_modified_at:
---
-Apache Hudi adds core warehouse and database functionality directly to a data
lake (more recently known as the data lakehouse architecture) elevating it from
a collection of
-objects/files to well-managed tables. Hudi adds table abstraction over open
file formats like Apache Parquet/ORC using a table format layer, that is
optimized for frequent writes,
-large-scale queries on a table snapshot as well efficient incremental scans.
To understand the Hudi stack, we can simply translate the components to the
seminal paper
+Hudi adds core warehouse and database functionality directly to a data lake
(more recently known as the data lakehouse architecture) elevating it from a
collection of
+objects/files to well-managed tables. Hudi adds table abstraction over open
file formats like Apache Parquet/ORC using a table format layer, that is
optimized for frequent writes,
+large-scale queries on a table snapshot as well efficient incremental scans.
To understand the Hudi stack, we can simply translate the components to the
seminal paper
on "[Architecture of a Database
System](https://dsf.berkeley.edu/papers/fntdb07-architecture.pdf)", with
modernized names.
-On top of this foundation, Hudi adds [storage
engine](https://en.wikipedia.org/wiki/Database_engine) functionality found in
many databases ("transactional storage manager" in the paper),
-enabling transactional capabilities such as concurrency control, indexing,
change capture and updates/deletes. The storage engine also consists of
essential table services
-to manage/maintain the tables, that are tightly integrated with the underlying
storage layer and executed automatically by upper-layer writers or platform
components
-like an independent table management service.
+On top of this foundation, Hudi adds [storage
engine](https://en.wikipedia.org/wiki/Database_engine) functionality found in
many databases ("transactional storage manager" in the paper),
+enabling transactional capabilities such as concurrency control, indexing,
change capture and updates/deletes. The storage engine also consists of
essential table services
+to manage/maintain the tables, that are tightly integrated with the underlying
storage layer and executed automatically by upper-layer writers or platform
components
+like an independent table management service.
Hudi then defined clear read/write APIs that help interact with the tables,
from a variety of SQL engines and code written in many programming languages
using their popular data
processing frameworks. Hudi also comes with several platform services that
help tune performance, operate tables, monitor tables, ingest data,
import/export data, and more.
-Thus, when all things considered, the Hudi stack expands out of being just a
'table format' to a comprehensive and robust [data
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
platform. In this section,
-we will explore the Hudi stack and deconstruct the layers of software
components that constitute Hudi. The features marked with an asterisk (*)
represent work in progress, and
-the dotted boxes indicate planned future work. These components collectively
aim to fulfill the
[vision](https://github.com/apache/hudi/blob/master/rfc/rfc-69/rfc-69.md) for
the project.
-

-<p align = "center">Figure: Apache Hudi Database Architecture</p>
+<p align = "center">The Hudi stack</p>
+
+Thus, when all things considered, the Hudi stack expands out of being just a
'table format' to a comprehensive and robust [data
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
platform. In this section,
+we will explore the Hudi stack and deconstruct the layers of software
components that constitute Hudi. The features marked with an asterisk (*)
represent work in progress, and
+the dotted boxes indicate planned future work. These components collectively
aim to fulfill the
[vision](https://github.com/apache/hudi/blob/master/rfc/rfc-69/rfc-69.md) for
the project.
## Lake Storage
-The storage layer is where the data files/objects (such as Parquet) as well as
all table format metadata are stored. Hudi interacts with the storage layer
through Cloud native and [Hadoop FileSystem
API](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html),
enabling compatibility
-with various systems including HDFS for fast appends, and various cloud stores
such as Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage.
Additionally, Hudi offers its own storage APIs that can rely on
Hadoop-independent file system
+
+The storage layer is where the data files/objects (such as Parquet) as well as
all table format metadata are stored. Hudi interacts with the storage layer
through Cloud native and [Hadoop FileSystem
API](https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html),
enabling compatibility
+with various systems including HDFS for fast appends, and various cloud stores
such as Amazon S3, Google Cloud Storage (GCS), and Azure Blob Storage.
Additionally, Hudi offers its own storage APIs that can rely on
Hadoop-independent file system
implementation to simplify the integration of various file systems. Hudi adds
a custom wrapper filesystem that lays out the foundation for improved storage
optimizations.
## File Formats
+

-<p align = "center">Figure: File format structure in Hudi</p>
+<p align = "center">File format structure in Hudi</p>
-File formats hold the actual data and are physically stored on the lake
storage. Hudi operates on logical structures of File Groups and File Slices,
which consist of Base File and Log Files.
-Log files store updates/deletes/inserts on top of records stored in base
files, and periodically log files are compacted into small set of log files
(log compaction) or base files (compaction).
+File formats hold the actual data and are physically stored on the lake
storage. Hudi operates on logical structures of File Groups and File Slices,
which consist of Base File and Log Files.
+Log files store updates/deletes/inserts on top of records stored in base
files, and periodically log files are compacted into small set of log files
(log compaction) or base files (compaction).
Future updates aim to integrate diverse formats like unstructured data (e.g.,
JSON, images), and compatibility with different storage layers in
event-streaming, OLAP engines, and warehouses.
-Hudi's layout scheme encodes all changes to a Log File as a sequence of blocks
(data, delete, rollback). By making data available in open file formats (such
as Parquet/Avro), Hudi enables users to
+Hudi's layout scheme encodes all changes to a Log File as a sequence of blocks
(data, delete, rollback). By making data available in open file formats (such
as Parquet/Avro), Hudi enables users to
bring any compute engine for specific workloads.
## Table Format
+

-<p align = "center">Figure: Apache Hudi's Table format</p>
+<p align = "center">Hudi's Table format</p>
-Drawing an analogy to file formats, a table format simply concerns with how
files are distributed with the table, partitioning schemes, schema and metadata
tracking changes. Hudi organizes files within a table or partition into
+Drawing an analogy to file formats, a table format simply concerns with how
files are distributed with the table, partitioning schemes, schema and metadata
tracking changes. Hudi organizes files within a table or partition into
File Groups. Updates are captured in log files tied to these File Groups,
ensuring efficient merges. There are three major components related to Hudi’s
table format.
-- **Timeline** : Hudi's [timeline](timeline), stored in the
`/.hoodie/timeline` folder, is a crucial event log recording all table actions
in an ordered manner,
+- **Timeline** : Hudi's [timeline](timeline), stored in the
`/.hoodie/timeline` folder, is a crucial event log recording all table actions
in an ordered manner,
with events kept for a specified period. Hudi uniquely designs each File
Group as a self-contained log, enabling record state reconstruction through
delta logs, even after archival of historical actions. This approach
effectively limits metadata size based on table activity frequency, essential
for managing tables with frequent updates.
-- **File Group and File Slice** : Within each partition the data is physically
stored as base and Log Files and organized into logical concepts as [File
groups](https://hudi.apache.org/tech-specs-1point0/#storage-layout) and
-File Slices. File groups contain multiple versions of File Slices and are
split into multiple File Slices. A File Slice comprises the Base and Log File.
Each File Slice within
+- **File Group and File Slice** : Within each partition the data is physically
stored as base and Log Files and organized into logical concepts as [File
groups](https://hudi.apache.org/tech-specs-1point0/#storage-layout) and
+File Slices. File groups contain multiple versions of File Slices and are
split into multiple File Slices. A File Slice comprises the Base and Log File.
Each File Slice within
the file-group is uniquely identified by the write that created its base file
or the first log file, which helps order the File Slices.
-- **Metadata Table** : Implemented as another merge-on-read Hudi table, the
[metadata table](metadata) efficiently handles quick updates with low write
amplification.
-It leverages a
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage-engine.html#sstables)
based file format for quick, indexed key lookups,
-storing vital information like file paths, column statistics and schema. This
approach streamlines operations by reducing the necessity for expensive cloud
file listings.
+- **Metadata Table** : Implemented as another merge-on-read Hudi table, the
[metadata table](metadata) efficiently handles quick updates with low write
amplification.
+It leverages a
[SSTable](https://cassandra.apache.org/doc/stable/cassandra/architecture/storage-engine.html#sstables)
based file format for quick, indexed key lookups,
+storing vital information like file paths, column statistics and schema. This
approach streamlines operations by reducing the necessity for expensive cloud
file listings.
+
+### Pluggable Table format
-Hudi’s approach of recording updates into Log Files is more efficient and
involves low merge overhead than systems like Hive ACID, where merging all
delta records against
-all Base Files is required. Read more about the various table types in Hudi
[here](table_types).
+Starting with Hudi 1.1, Hudi introduces a pluggable table format framework
that extends Hudi's powerful storage engine capabilities beyond its native
format to other table formats like Apache Iceberg and Delta Lake. This
framework decouples Hudi's core capabilities—transaction management, indexing,
concurrency control, and table services—from the specific storage format used
for data files. Hudi provides native format support (configured via
`hoodie.table.format=native` by default), whil [...]
+Hudi's approach of recording updates into Log Files is more efficient and
involves low merge overhead than systems like Hive ACID, where merging all
delta records against
+all Base Files is required. Read more about the various table types in Hudi
[table types documentation](table_types).
## Storage Engine
-The storage layer of Hudi comprises the core components that are responsible
for the fundamental operations and services that enable Hudi to store,
retrieve, and manage data
-efficiently on [data
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
storages. This functionality is comparable to that of roles play by storage
engines in popular databases like PostgreSQL, MySQL, MongoDB,
-Cassandra and Clickhouse.
+The storage layer of Hudi comprises the core components that are responsible
for the fundamental operations and services that enable Hudi to store,
retrieve, and manage data
+efficiently on [data
lakehouse](https://hudi.apache.org/blog/2024/07/11/what-is-a-data-lakehouse/)
storages. This functionality is comparable to that of roles play by storage
engines in popular databases like PostgreSQL, MySQL, MongoDB,
+Cassandra and Clickhouse.
### Indexes
+

-<p align = "center">Figure: Indexes in Hudi</p>
+<p align = "center">Indexes in Hudi</p>
-[Indexes](indexes) in Hudi enhance query planning, minimizing I/O, speeding up
response times and providing faster writes with low merge costs. The [metadata
table](metadata/#metadata-table-indices) acts
+[Indexes](indexes) in Hudi enhance query planning, minimizing I/O, speeding up
response times and providing faster writes with low merge costs. The [metadata
table](metadata/#metadata-table-indices) acts
as an additional [indexing
system](metadata#supporting-multi-modal-index-in-hudi) and brings the benefits
of indexes generally to both the readers and writers. Compute engines can
leverage various indexes in the metadata
-table, like file listings, column statistics, bloom filters, record-level
indexes, and [expression
indexes](https://github.com/apache/hudi/blob/master/rfc/rfc-63/rfc-63.md) to
quickly generate optimized query plans and improve read
-performance. In addition to the metadata table indexes, Hudi supports simple
join based indexing, bloom filters stored in base file footers, external
key-value stores like HBase,
-and optimized storage techniques like bucketing , to efficiently locate File
Groups containing specific record keys. Hudi also provides reader indexes such
as
[expression](https://github.com/apache/hudi/blob/master/rfc/rfc-63/rfc-63.md)
and
+table, like file listings, column statistics, bloom filters, record-level
indexes, and [expression
indexes](https://github.com/apache/hudi/blob/master/rfc/rfc-63/rfc-63.md) to
quickly generate optimized query plans and improve read
+performance. In addition to the metadata table indexes, Hudi supports simple
join based indexing, bloom filters stored in base file footers, external
key-value stores like HBase,
+and optimized storage techniques like bucketing , to efficiently locate File
Groups containing specific record keys. Hudi also provides reader indexes such
as
[expression](https://github.com/apache/hudi/blob/master/rfc/rfc-63/rfc-63.md)
and
secondary indexes to boost reads. The table partitioning scheme in Hudi is
consciously exploited for implementing global and non-global indexing
strategies, that limit scope of a record's
uniqueness to a given partition or globally across all partitions.
### Table Services
+

-<p align = "center">Figure: Table services in Hudi</p>
+<p align = "center">Table services in Hudi</p>
-Apache Hudi offers various table services to help keep the table storage
layout and metadata management performant. Hudi was designed with built-in
table services that enables
-running them in inline, semi-asynchronous or full-asynchronous modes.
Furthermore, Spark and Flink streaming writers can run in continuous mode, and
invoke table services
+Hudi offers various table services to help keep the table storage layout and
metadata management performant. Hudi was designed with built-in table services
that enables
+running them in inline, semi-asynchronous or full-asynchronous modes.
Furthermore, Spark and Flink streaming writers can run in continuous mode, and
invoke table services
asynchronously sharing the underlying executors intelligently with writers.
Let’s take a look at these services.
#### Clustering
-The [clustering](clustering) service, akin to features in cloud data
warehouses, allows users to group frequently queried records using sort keys or
merge smaller Base Files into
-larger ones for optimal file size management. It's fully integrated with other
timeline actions like cleaning and compaction, enabling smart optimizations
such as avoiding
+
+The [clustering](clustering) service, akin to features in cloud data
warehouses, allows users to group frequently queried records using sort keys or
merge smaller Base Files into
+larger ones for optimal file size management. It's fully integrated with other
timeline actions like cleaning and compaction, enabling smart optimizations
such as avoiding
compaction for File Groups undergoing clustering, thereby saving on I/O.
#### Compaction
-Hudi's [compaction](compaction) service, featuring strategies like date
partitioning and I/O bounding, merges Base Files with delta logs to create
updated Base Files. It allows
-concurrent writes to the same File Froup, enabled by Hudi's file grouping and
flexible log merging. This facilitates non-blocking execution of deletes even
during concurrent
+
+Hudi's [compaction](compaction) service, featuring strategies like date
partitioning and I/O bounding, merges Base Files with delta logs to create
updated Base Files. It allows
+concurrent writes to the same File Froup, enabled by Hudi's file grouping and
flexible log merging. This facilitates non-blocking execution of deletes even
during concurrent
record updates.
#### Cleaning
-[Cleaner](http://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner)
service works off the timeline incrementally, removing File Slices that are
past the configured retention period for incremental queries,
+
+[Cleaner](http://hudi.apache.org/blog/2021/06/10/employing-right-configurations-for-hudi-cleaner)
service works off the timeline incrementally, removing File Slices that are
past the configured retention period for incremental queries,
while also allowing sufficient time for long running batch jobs (e.g Hive
ETLs) to finish running. This allows users to reclaim storage space, thereby
saving on costs.
#### Indexing
-Hudi's scalable metadata table contains auxiliary data about the table. This
subsystem encompasses various indices, including files, column_stats, and
bloom_filters,
-facilitating efficient record location and data skipping. Balancing write
throughput with index updates presents a fundamental challenge, as traditional
indexing methods,
-like locking out writes during indexing, are impractical for large tables due
to lengthy processing times. Hudi addresses this with its innovative
asynchronous [metadata indexing](metadata_indexing),
+
+Hudi's scalable metadata table contains auxiliary data about the table. This
subsystem encompasses various indices, including files, column_stats, and
bloom_filters,
+facilitating efficient record location and data skipping. Balancing write
throughput with index updates presents a fundamental challenge, as traditional
indexing methods,
+like locking out writes during indexing, are impractical for large tables due
to lengthy processing times. Hudi addresses this with its innovative
asynchronous [metadata indexing](metadata_indexing),
enabling the creation of various indices without impeding writes. This
approach not only improves write latency but also minimizes resource waste by
reducing contention between writing and indexing activities.
### Concurrency Control
-[Concurrency control](concurrency_control) defines how different
writers/readers/table services coordinate access to the table. Hudi uses
monotonically increasing time to sequence and order various
-changes to table state. Much like databases, Hudi take an approach of clearly
differentiating between writers (responsible for upserts/deletes), table
services
-(focusing on storage optimization and bookkeeping), and readers (for query
execution). Hudi provides snapshot isolation, offering a consistent view of the
table across
-these different operations. It employs lock-free, non-blocking MVCC for
concurrency between writers and table-services, as well as between different
table services, and
-optimistic concurrency control (OCC) for multi-writers with early conflict
detection. With [Hudi
1.0](https://github.com/apache/hudi/blob/master/rfc/rfc-69/rfc-69.md),
non-blocking concurrency control
([NBCC](https://github.com/apache/hudi/blob/master/rfc/rfc-66/rfc-66.md))
+
+[Concurrency control](concurrency_control) defines how different
writers/readers/table services coordinate access to the table. Hudi uses
monotonically increasing time to sequence and order various
+changes to table state. Much like databases, Hudi take an approach of clearly
differentiating between writers (responsible for upserts/deletes), table
services
+(focusing on storage optimization and bookkeeping), and readers (for query
execution). Hudi provides snapshot isolation, offering a consistent view of the
table across
+these different operations. It employs lock-free, non-blocking MVCC for
concurrency between writers and table-services, as well as between different
table services, and
+optimistic concurrency control (OCC) for multi-writers with early conflict
detection. With [Hudi
1.0](https://github.com/apache/hudi/blob/master/rfc/rfc-69/rfc-69.md),
non-blocking concurrency control
([NBCC](https://github.com/apache/hudi/blob/master/rfc/rfc-66/rfc-66.md))
is introduced, allowing multiple writers to concurrently operate on the table
with non-blocking conflict resolution.
### Lake Cache*
+

-<p align = "center">Figure: Proposed Lake Cache in Hudi</p>
+<p align = "center">Proposed Lake Cache in Hudi</p>
Data lakes today face a tradeoff between fast data writing and optimal query
performance. Writing smaller files or logging deltas enhances writing speed,
but superior query performance typically requires opening fewer files and
pre-materializing merges. Most databases use a buffer pool to reduce storage
access costs. Hudi’s design supports creating a multi-tenant caching tier that
can store pre-merged File Slices. Hudi’s timeline can then be used to simply
communicate caching policies. T [...]
-
## Programming APIs
### Writers
+
Hudi tables can be used as sinks for Spark/Flink pipelines and the Hudi
writing path provides several enhanced capabilities over file writing done by
vanilla parquet/avro sinks. It categorizes write operations into incremental
(`insert`, `upsert`, `delete`) and batch/bulk (`insert_overwrite`,
`delete_partition`, `bulk_insert`) with specific functionalities. `upsert` and
`delete` efficiently merge records with identical keys and integrate with the
file sizing mechanism, while `insert` ope [...]
### Readers
+
Hudi provides snapshot isolation for writers and readers, enabling consistent
table snapshot queries across major query engines (Spark, Hive, Flink, Presto,
Trino, Impala) and cloud warehouses. It optimizes query performance by
utilizing lightweight processes, especially for base columnar file reads, and
integrates engine-specific vectorized readers like in Presto and Trino. This
scalable model surpasses the need for separate readers and taps into each
engine's unique optimizations, such [...]
## User Access
### SQL Engines
-Apache Hudi is compatible with a wide array of SQL query engines, catering to
various analytical needs. For distributed ETL batch processing, Apache Spark is
frequently utilized,
-leveraging its efficient handling of large-scale data. In the realm of
streaming use cases, compute engines such as Apache Flink and Apache Spark's
Structured Streaming provide
-robust support when paired with Hudi. Moreover, Hudi supports modern data lake
query engines such as Trino and Presto, as well as modern analytical databases
such as ClickHouse
-and StarRocks. This diverse support of compute engines positions Apache Hudi
as a flexible and adaptable platform for a broad spectrum of use cases.
+
+Hudi is compatible with a wide array of SQL query engines, catering to various
analytical needs. For distributed ETL batch processing, Apache Spark is
frequently utilized,
+leveraging its efficient handling of large-scale data. In the realm of
streaming use cases, compute engines such as Apache Flink and Apache Spark's
Structured Streaming provide
+robust support when paired with Hudi. Moreover, Hudi supports modern data lake
query engines such as Trino and Presto, as well as modern analytical databases
such as ClickHouse
+and StarRocks. This diverse support of compute engines positions Hudi as a
flexible and adaptable platform for a broad spectrum of use cases.
### Code Frameworks
+
While SQL still rules the roost when it comes to data engineering, an equally
important and widespread data engineering/data science practice is to write
code in different
-languages like Java, Scala, Python and R, to analyze data using sophisticated
algorithms with full expressiveness of the language. To this end, Hudi supports
several
+languages like Java, Scala, Python and R, to analyze data using sophisticated
algorithms with full expressiveness of the language. To this end, Hudi supports
several
popular data processing frameworks like Apache Spark and Apache Flink, as well
as python based distributed frameworks like Daft, Ray and native bindings in
Rust for easy
integration with engines written in C/C++.
...
## Platform Services
+

-<p align = "center">Figure: Various platform services in Hudi</p>
-
-Platform services offer functionality that is specific to data and workloads,
and they sit directly on top of the table services, interfacing with writers
and readers.
-Services, like [Hudi Streamer](hoodie_streaming_ingestion#hudi-streamer) (or
its Flink counterpart), are specialized in handling data and workloads,
seamlessly integrating with Kafka streams and various
-formats to build data lakes. They support functionalities like automatic
checkpoint management, integration with major schema registries (including
Confluent), and
-deduplication of data. Hudi Streamer also offers features for backfills,
one-off runs, and continuous mode operation with Spark/Flink streaming writers.
Additionally,
-Hudi provides tools for [snapshotting](snapshot_exporter) and incrementally
[exporting](snapshot_exporter#examples) Hudi tables, importing new tables, and
[post-commit callback](platform_services_post_commit_callback) for analytics or
-workflow management, enhancing the deployment of production-grade incremental
pipelines. Apart from these services, Hudi also provides broad support for
different
-catalogs such as [Hive Metastore](syncing_metastore), [AWS
Glue](syncing_aws_glue_data_catalog/), [Google BigQuery](gcp_bigquery),
[DataHub](syncing_datahub), etc. that allows syncing of Hudi tables to be
queried by
+<p align = "center">Various platform services in Hudi</p>
+
+Platform services offer functionality that is specific to data and workloads,
and they sit directly on top of the table services, interfacing with writers
and readers.
+Services, like [Hudi Streamer](hoodie_streaming_ingestion#hudi-streamer) (or
its Flink counterpart), are specialized in handling data and workloads,
seamlessly integrating with Kafka streams and various
+formats to build data lakes. They support functionalities like automatic
checkpoint management, integration with major schema registries (including
Confluent), and
+deduplication of data. Hudi Streamer also offers features for backfills,
one-off runs, and continuous mode operation with Spark/Flink streaming writers.
Additionally,
+Hudi provides tools for [snapshotting](snapshot_exporter) and incrementally
[exporting](snapshot_exporter#examples) Hudi tables, importing new tables, and
[post-commit callback](platform_services_post_commit_callback) for analytics or
+workflow management, enhancing the deployment of production-grade incremental
pipelines. Apart from these services, Hudi also provides broad support for
different
+catalogs such as [Hive Metastore](syncing_metastore), [AWS
Glue](syncing_aws_glue_data_catalog/), [Google BigQuery](gcp_bigquery),
[DataHub](syncing_datahub), etc. that allows syncing of Hudi tables to be
queried by
interactive engines such as Trino and Presto.
### Metaserver*
+

-<p align = "center">Figure: Proposed Metaserver in Hudi</p>
+<p align = "center">Proposed Metaserver in Hudi</p>
-Storing table metadata on lake storage, while scalable, is less efficient than
RPCs to a scalable meta server. Hudi addresses this with its metadata server,
called "metaserver,"
-an efficient alternative for managing table metadata for a large number of
tables. Currently, the timeline server, embedded in Hudi's writer processes,
uses a local rocksDB store and [Javalin](https://javalin.io/) REST API to serve
file listings, reducing cloud storage listings.
-Since version 0.6.0, there's a trend towards standalone timeline servers,
aimed at horizontal scaling and improved security. These developments are set
to create a more efficient lake
[metastore](https://issues.apache.org/jira/browse/HUDI-3345)
+Storing table metadata on lake storage, while scalable, is less efficient than
RPCs to a scalable meta server. Hudi addresses this with its metadata server,
called "metaserver,"
+an efficient alternative for managing table metadata for a large number of
tables. Currently, the timeline server, embedded in Hudi's writer processes,
uses a local rocksDB store and [Javalin](https://javalin.io/) REST API to serve
file listings, reducing cloud storage listings.
+Since version 0.6.0, there's a trend towards standalone timeline servers,
aimed at horizontal scaling and improved security. These developments are set
to create a more efficient lake
[metastore](https://issues.apache.org/jira/browse/HUDI-3345)
for future needs.
diff --git a/website/static/assets/images/hudi-stack-1-x.png
b/website/static/assets/images/hudi-stack-1-x.png
index ba61758c4b8e..b933d06c7f38 100644
Binary files a/website/static/assets/images/hudi-stack-1-x.png and
b/website/static/assets/images/hudi-stack-1-x.png differ