This is an automated email from the ASF dual-hosted git repository.
sivabalan pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 7e5daaa [HUDI-2821] - Docs for Metadata Table (#4226)
7e5daaa is described below
commit 7e5daaabe23c02a04d8bb02f57ca3aa57ea6dd7a
Author: Kyle Weller <[email protected]>
AuthorDate: Wed Dec 8 11:21:46 2021 -0800
[HUDI-2821] - Docs for Metadata Table (#4226)
---
website/docs/metadata.md | 30 ++++++++++++++++++++++++++++++
website/sidebars.js | 1 +
2 files changed, 31 insertions(+)
diff --git a/website/docs/metadata.md b/website/docs/metadata.md
new file mode 100644
index 0000000..13cf669
--- /dev/null
+++ b/website/docs/metadata.md
@@ -0,0 +1,30 @@
+---
+title: Metadata Table
+keywords: [ hudi, metadata, S3 file listings]
+---
+
+## Motivation for a Metadata Table
+
+The Apache Hudi Metadata Table can significantly improve read/write
performance of your queries. The main purpose of the
+Metadata Table is:
+
+1. **Eliminate the requirement for the "list files" operation:**
+ 1. When reading and writing data, file listing operations are performed to
get the current view of the file system.
+ When data sets are large, listing all the files becomes a performance
bottleneck and in the case of cloud storage systems
+ like AWS S3, sometimes causes throttling due to list operation request
limits. The Metadata Table will instead
+ proactively maintain the list of files and remove the need for recursive
file listing operations.
+
+## Enable Hudi Metadata Table
+The Hudi Metadata Table is not enabled by default. If you wish to turn it on
you need to enable the following configuration:
+
+[`hoodie.metadata.enable`](/docs/configurations#hoodiemetadataenable)
+
+## Deployment considerations
+Once you turn on the Hudi Metadata Table, ensure that all write and read
operations enable the configuration above to
+ensure the Metadata Table stays up to date.
+
+:::note
+If your current deployment model is single writer along with async table
services (such as cleaning, clustering, compaction)
+configured, then it is a must to have [lock providers
configured](/docs/next/concurrency_control#enabling-multi-writing)
+before turning on the metadata table.
+:::
\ No newline at end of file
diff --git a/website/sidebars.js b/website/sidebars.js
index c0ae050..23b9227 100644
--- a/website/sidebars.js
+++ b/website/sidebars.js
@@ -54,6 +54,7 @@ module.exports = {
'transforms',
'markers',
'file_sizing',
+ 'metadata',
'snapshot_exporter',
'precommit_validator'
],