This is an automated email from the ASF dual-hosted git repository.
uditme pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 06b0098 [HUDI-2775] Add doc for external configuration and dynamodb
based lock (#4087)
06b0098 is described below
commit 06b0098f4f978d65dcd9894877aea2c6319a2123
Author: wenningd <[email protected]>
AuthorDate: Mon Nov 29 13:37:59 2021 -0500
[HUDI-2775] Add doc for external configuration and dynamodb based lock
(#4087)
Co-authored-by: Wenning Ding <[email protected]>
---
website/docs/concurrency_control.md | 20 +++++++++++++++++++-
website/docs/configurations.md | 12 ++++++++++++
2 files changed, 31 insertions(+), 1 deletion(-)
diff --git a/website/docs/concurrency_control.md
b/website/docs/concurrency_control.md
index 96ff6eb..a9a0d58 100644
--- a/website/docs/concurrency_control.md
+++ b/website/docs/concurrency_control.md
@@ -47,7 +47,7 @@ hoodie.cleaner.policy.failed.writes=LAZY
hoodie.write.lock.provider=<lock-provider-classname>
```
-There are 2 different server based lock providers that require different
configuration to be set.
+There are 3 different server based lock providers that require different
configuration to be set.
**`Zookeeper`** based lock provider
@@ -69,6 +69,24 @@ hoodie.write.lock.hivemetastore.table
`The HiveMetastore URI's are picked up from the hadoop configuration file
loaded during runtime.`
+**`Amazon DynamoDB`** based lock provider
+
+Amazon DynamoDB based lock provides a simple way to support multi writing
across different clusters
+
+```
+hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
+hoodie.write.lock.dynamodb.table
+hoodie.write.lock.dynamodb.partition_key
+hoodie.write.lock.dynamodb.region
+```
+Also, to set up the credentials for accessing AWS resources, customers can
pass the following props to Hudi jobs:
+```
+hoodie.aws.access.key
+hoodie.aws.secret.key
+hoodie.aws.session.token
+```
+If not configured, Hudi falls back to use
[DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html).
+
## Datasource Writer
The `hudi-spark` module offers the DataSource API to write (and read) a Spark
DataFrame into a Hudi table.
diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index 97b51a1..b3ee3fc 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -14,6 +14,7 @@ This page covers the different ways of configuring your job
to write/read Hudi t
- [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource
uses a RDD based HoodieWriteClient API to actually perform writes to storage.
These configs provide deep control over lower level aspects like file sizing,
compression, parallelism, compaction, write schema, cleaning etc. Although Hudi
provides sane defaults, from time-time these configs may need to be tweaked to
optimize for specific workloads.
- [**Metrics Configs**](#METRICS): These set of configs are used to enable
monitoring and reporting of keyHudi stats and metrics.
- [**Record Payload Config**](#RECORD_PAYLOAD): This is the lowest level of
customization offered by Hudi. Record payloads define how to produce new values
to upsert based on incoming new record and stored old record. Hudi provides
default implementations such as OverwriteWithLatestAvroPayload which simply
update table with the latest/last-written record. This can be overridden to a
custom class extending HoodieRecordPayload class, on both datasource and
WriteClient levels.
+- [**Environment Config**](#ENVIRONMENT_CONFIG): Instead of directly passing
configurations to Hudi jobs, since 0.10.0, Hudi also supports configurations
via a configuration file `hudi-default.conf` in which each line consists of a
key and a value separated by whitespace or = sign.
## Spark Datasource Configs {#SPARK_DATASOURCE}
These configs control the Hudi Spark Datasource, providing ability to define
keys/partitioning, pick out the write operation, specify how to merge records
or choosing query type to read.
@@ -3244,3 +3245,14 @@ Payload related configs, that can be leveraged to
control merges based on specif
---
+## Environment Config {#ENVIRONMENT_CONFIG}
+Hudi supports passing configurations via a configuration file
`hudi-default.conf` in which each line consists of a key and a value separated
by whitespace or = sign. For example:
+```
+hoodie.datasource.hive_sync.mode jdbc
+hoodie.datasource.hive_sync.jdbcurl jdbc:hive2://localhost:10000
+hoodie.datasource.hive_sync.support_timestamp false
+```
+It helps to have a central configuration file for your common cross job
configurations/tunings, so all the jobs on your cluster can utilize it. It also
works with Spark SQL DML/DDL, and helps avoid having to pass configs inside the
SQL statements.
+
+By default, Hudi would load the configuration file under `/etc/hudi/conf`
directory. You can specify a different configuration directory location by
setting the `HUDI_CONF_DIR` environment variable.
+