This is an automated email from the ASF dual-hosted git repository.
yihua pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new f1955e15eae [DOCS] Update hive metastore sync docs (#7968)
f1955e15eae is described below
commit f1955e15eae42365e5c06c412e7160f1993f6e94
Author: Sagar Sumit <[email protected]>
AuthorDate: Mon Sep 25 02:05:15 2023 +0530
[DOCS] Update hive metastore sync docs (#7968)
- Added a brief intro about Hive metastore.
- Removed deprecated config.
- Added default values and better explanation for rest of the configs.
---------
Co-authored-by: Y Ethan Guo <[email protected]>
---
website/docs/syncing_metastore.md | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/website/docs/syncing_metastore.md
b/website/docs/syncing_metastore.md
index d1600be3967..9dc5d419b3d 100644
--- a/website/docs/syncing_metastore.md
+++ b/website/docs/syncing_metastore.md
@@ -3,6 +3,13 @@ title: Hive Metastore
keywords: [hudi, hive, sync]
---
+[Hive
Metastore](https://cwiki.apache.org/confluence/display/Hive/AdminManual+Metastore+Administration)
is an
+RDBMS-backed service from Apache Hive that acts as a catalog for your data
warehouse or data lake. It can store all the
+metadata about the tables, such as partitions, columns, column types, etc. One
can sync the Hudi table metadata to the
+Hive metastore as well. This unlocks the capability to query Hudi tables not
only through Hive but also using
+interactive query engines such as Presto and Trino. In this document, we will
go through different ways to sync the Hudi
+table to Hive metastore.
+
## Hive Sync Tool
Writing data with [DataSource](/docs/writing_data) writer or
[HoodieStreamer](/docs/hoodie_deltastreamer) supports syncing of the table's
latest schema to Hive metastore, such that queries can pick up new columns and
partitions.
@@ -33,17 +40,20 @@ Among them, following are the required arguments:
```
Corresponding datasource options for the most commonly used hive sync configs
are as follows:
-| HiveSyncConfig | DataSourceWriteOption | Description |
-| ----------- | ----------- | ----------- |
-| --database | hoodie.datasource.hive_sync.database | name of the
target database in Hive |
-| --table | hoodie.datasource.hive_sync.table | name of the target
table in Hive |
-| --user | hoodie.datasource.hive_sync.username | username for hive
metastore |
-| --pass | hoodie.datasource.hive_sync.password | password for hive
metastore |
-| --use-jdbc | hoodie.datasource.hive_sync.use_jdbc | use JDBC to
connect to metastore |
-| --jdbc-url | hoodie.datasource.hive_sync.jdbcurl | Hive metastore
url |
-| --sync-mode | hoodie.datasource.hive_sync.mode | Mode to choose for
Hive ops. Valid values are hms, jdbc and hiveql. |
-| --partitioned-by | hoodie.datasource.hive_sync.partition_fields |
Comma-separated column names in the table to use for determining hive
partition. |
-| --partition-value-extractor |
hoodie.datasource.hive_sync.partition_extractor_class | Class which
implements PartitionValueExtractor to extract the partition values.
`SlashEncodedDayPartitionValueExtractor` by default. |
+:::note
+In the table below **(N/A)** means there is no default value set.
+:::
+
+| HiveSyncConfig | DataSourceWriteOption | Default Value | Description |
+| ----------- | ----------- | ----------- | ----------- |
+| --database | hoodie.datasource.hive_sync.database | default | Name
of the target database in Hive |
+| --table | hoodie.datasource.hive_sync.table | (N/A) | Name of the
target table in Hive. Inferred from the table name in Hudi table config if not
specified. |
+| --user | hoodie.datasource.hive_sync.username | hive | Username for
hive metastore |
+| --pass | hoodie.datasource.hive_sync.password | hive | Password for
hive metastore |
+| --jdbc-url | hoodie.datasource.hive_sync.jdbcurl |
jdbc:hive2://localhost:10000 | Hive server url if using `jdbc` mode to sync
|
+| --sync-mode | hoodie.datasource.hive_sync.mode | (N/A) | Mode to
choose for Hive ops. Valid values are `hms`, `jdbc` and `hiveql`. More details
in the following section. |
+| --partitioned-by | hoodie.datasource.hive_sync.partition_fields | (N/A)
| Comma-separated column names in the table to use for determining hive
partition. |
+| --partition-value-extractor |
hoodie.datasource.hive_sync.partition_extractor_class |
`org.apache.hudi.hive.MultiPartKeysValueExtractor` | Class which implements
`PartitionValueExtractor` to extract the partition values. Inferred
automatically depending on the partition fields specified. |
### Sync modes