This is an automated email from the ASF dual-hosted git repository.

sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 7e5daaa  [HUDI-2821] - Docs for Metadata Table (#4226)
7e5daaa is described below

commit 7e5daaabe23c02a04d8bb02f57ca3aa57ea6dd7a
Author: Kyle Weller <[email protected]>
AuthorDate: Wed Dec 8 11:21:46 2021 -0800

    [HUDI-2821] - Docs for Metadata Table (#4226)
---
 website/docs/metadata.md | 30 ++++++++++++++++++++++++++++++
 website/sidebars.js      |  1 +
 2 files changed, 31 insertions(+)

diff --git a/website/docs/metadata.md b/website/docs/metadata.md
new file mode 100644
index 0000000..13cf669
--- /dev/null
+++ b/website/docs/metadata.md
@@ -0,0 +1,30 @@
+---
+title: Metadata Table
+keywords: [ hudi, metadata, S3 file listings]
+---
+
+## Motivation for a Metadata Table
+
+The Apache Hudi Metadata Table can significantly improve read/write 
performance of your queries. The main purpose of the 
+Metadata Table is:
+
+1. **Eliminate the requirement for the "list files" operation:**
+   1. When reading and writing data, file listing operations are performed to 
get the current view of the file system.
+      When data sets are large, listing all the files becomes a performance 
bottleneck and in the case of cloud storage systems
+      like AWS S3, sometimes causes throttling due to list operation request 
limits. The Metadata Table will instead
+      proactively maintain the list of files and remove the need for recursive 
file listing operations.
+
+## Enable Hudi Metadata Table
+The Hudi Metadata Table is not enabled by default. If you wish to turn it on 
you need to enable the following configuration:
+
+[`hoodie.metadata.enable`](/docs/configurations#hoodiemetadataenable)
+
+## Deployment considerations
+Once you turn on the Hudi Metadata Table, ensure that all write and read 
operations enable the configuration above to 
+ensure the Metadata Table stays up to date.
+
+:::note
+If your current deployment model is single writer along with async table 
services (such as cleaning, clustering, compaction) 
+configured, then it is a must to have [lock providers 
configured](/docs/next/concurrency_control#enabling-multi-writing) 
+before turning on the metadata table.
+:::
\ No newline at end of file
diff --git a/website/sidebars.js b/website/sidebars.js
index c0ae050..23b9227 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -54,6 +54,7 @@ module.exports = {
                 'transforms',
                 'markers',
                 'file_sizing',
+                'metadata',
                 'snapshot_exporter',
                 'precommit_validator'
             ],

Reply via email to