bhasudha commented on code in PR #9406:
URL: https://github.com/apache/hudi/pull/9406#discussion_r1306219437


##########
website/docs/metadata.md:
##########
@@ -3,80 +3,173 @@ title: Metadata Table
 keywords: [ hudi, metadata, S3 file listings]
 ---
 
-## Motivation for a Metadata Table
+## Metadata Table
+
+Database indices contain auxiliary data structures to quickly locate records 
needed, without reading unnecessary data 
+from storage. Given that Hudi’s design has been heavily optimized for handling 
mutable change streams, with different 
+write patterns, Hudi considers [indexing](#indexing) as an integral part of 
its design and has uniquely supported 
+[indexing 
capabilities](https://hudi.apache.org/blog/2020/11/11/hudi-indexing-mechanisms/)
 from its inception, to speed 
+up upserts on the Data Lakehouse. While Hudi's indices has benefited writers 
for fast upserts and deletes, Hudi's metadata table 
+aims to tap these benefits more generally for both the readers and writers. 
The metadata table implemented as a single 
+internal Hudi Merge-On-Read table hosts different types of indices containing 
table metadata and is designed to be
+serverless and independent of compute and query engines. This is similar to 
common practices in databases where metadata
+is stored as internal views.
+
+The metadata table aims to significantly improve read/write performance of the 
queries by addressing the following key challenges:
+- **Eliminate the requirement of `list files` operation**:<br />
+  When reading and writing data, file listing operations are performed to get 
the current view of the file system.
+  When data sets are large, listing all the files may be a performance 
bottleneck, but more importantly in the case of cloud storage systems
+  like AWS S3, the large number of file listing requests sometimes causes 
throttling due to certain request limits.
+  The metadata table will instead proactively maintain the list of files and 
remove the need for recursive file listing operations
+- **Expose columns stats through indices for better query planning and faster 
lookups by readers**:<br />
+  Query engines rely on techniques such as partitioning and file pruning to 
cut down on the amount of irrelevant data 
+  scanned for query planning and execution. During query planning phase all 
data files are read for metadata on range 
+  information of columns for further pruning data files based on query 
predicates and available range information. This
+  approach is expensive and does not scale if there are large number of 
partitions and data files to be scanned. In
+  addition to storage optimizations such as automatic file sizing, clustering, 
etc that helps data organization in a query
+  optimized way, Hudi's metadata table improves query planning further by 
supporting multiple types of indices that aid 
+  in efficiently looking up data files based on relevant query predicates 
instead of reading the column stats from every 
+  individual data file and then pruning. 
+   
+## Supporting Multi-Modal Index in Hudi
+
+[Multi-modal 
indexing](https://www.onehouse.ai/blog/introducing-multi-modal-index-for-the-lakehouse-in-apache-hudi),
 
+introduced in [0.11.0 Hudi 
release](https://hudi.apache.org/releases/release-0.11.0/#multi-modal-index), 
+is a re-imagination of what a general purpose indexing subsystem should look 
like for the lake. Multi-modal indexing is 
+implemented by enhancing Hudi's metadata table with the flexibility to extend 
to new index types as new partitions,
+along with an [asynchronous 
index](https://hudi.apache.org/docs/metadata_indexing/#setup-async-indexing) 
building 
+mechanism and is built on the following core principles:
+- **Scalable metadata**: The table metadata, i.e., the auxiliary data about 
the table, must be scalable to extremely 
+  large size, e.g., Terabytes (TB).  Different types of indices should be 
easily integrated to support various use cases 
+  without having to worry about managing the same. To realize this, all 
indices in Hudi's metadata table are stored as 
+  partitions in a single internal MOR table. The MOR table layout enables 
lightning-fast writes by avoiding synchronous 
+  merge of data with reduced write amplification. This is extremely important 
for large datasets as the size of updates to the 
+  metadata table can grow to be unmanageable otherwise. This helps Hudi to 
scale metadata to TBs of sizes. The 
+  foundational framework for multi-modal indexing is built to enable and 
disable new indices as needed. The 
+  [async 
indexing](https://www.onehouse.ai/blog/asynchronous-indexing-using-hudi) 
supports index building alongside 
+  regular writers without impacting the write latency.
+- **ACID transactional updates**: The index and table metadata must be always 
up-to-date and in sync with the data table. 
+  This is designed via multi-table transaction within Hudi and ensures 
atomicity of writes and resiliency to failures so that 
+  partial writes to either the data or metadata table are never exposed to 
other read or write transactions. The metadata 
+  table is built to be self-managed so users don’t need to spend operational 
cycles on any table services including 
+  compaction and cleaning    
+- **Fast lookup**: The needle-in-a-haystack type of lookups must be fast and 
efficient without having to scan the entire 
+  index, as index size can be TBs for large datasets. Since most access to the 
metadata table are point and range lookups,
+  the HFile format is chosen as the base file format for the internal metadata 
table. Since the metadata table stores 
+  the auxiliary data at the partition level (files index) or the file level 
(column_stats index), the lookup based on a 
+  single partition path and a file group is going to be very efficient with 
the HFile format. Both the base and log files 
+  in Hudi’s metadata table uses the HFile format and are meticulously designed 
to reduce remote GET calls on cloud storages.
+  Further, these metadata table indices are served via a centralized timeline 
server which caches the metadata, further 
+  reducing the latency of the lookup from executors.
+
+### Metadata table indices
+
+Following are the different indices currently available under the metadata 
table.
+
+- ***[files 
index](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+15%3A+HUDI+File+Listing+Improvements)***:
 

Review Comment:
   Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to