[hudi] branch asf-site updated: [HUDI-2775] Add doc for external configuration and dynamodb based lock (#4087)

uditme Mon, 29 Nov 2021 10:42:10 -0800

This is an automated email from the ASF dual-hosted git repository.

uditme pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 06b0098  [HUDI-2775] Add doc for external configuration and dynamodb 
based lock (#4087)
06b0098 is described below

commit 06b0098f4f978d65dcd9894877aea2c6319a2123
Author: wenningd <[email protected]>
AuthorDate: Mon Nov 29 13:37:59 2021 -0500

    [HUDI-2775] Add doc for external configuration and dynamodb based lock 
(#4087)
    
    Co-authored-by: Wenning Ding <[email protected]>
---
 website/docs/concurrency_control.md | 20 +++++++++++++++++++-
 website/docs/configurations.md      | 12 ++++++++++++
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/website/docs/concurrency_control.md 
b/website/docs/concurrency_control.md
index 96ff6eb..a9a0d58 100644
--- a/website/docs/concurrency_control.md
+++ b/website/docs/concurrency_control.md
@@ -47,7 +47,7 @@ hoodie.cleaner.policy.failed.writes=LAZY
 hoodie.write.lock.provider=<lock-provider-classname>
 ```
 
-There are 2 different server based lock providers that require different 
configuration to be set.
+There are 3 different server based lock providers that require different 
configuration to be set.
 
 **`Zookeeper`** based lock provider
 
@@ -69,6 +69,24 @@ hoodie.write.lock.hivemetastore.table
 
 `The HiveMetastore URI's are picked up from the hadoop configuration file 
loaded during runtime.`
 
+**`Amazon DynamoDB`** based lock provider
+
+Amazon DynamoDB based lock provides a simple way to support multi writing 
across different clusters
+
+```
+hoodie.write.lock.provider=org.apache.hudi.aws.transaction.lock.DynamoDBBasedLockProvider
+hoodie.write.lock.dynamodb.table
+hoodie.write.lock.dynamodb.partition_key
+hoodie.write.lock.dynamodb.region
+```
+Also, to set up the credentials for accessing AWS resources, customers can 
pass the following props to Hudi jobs:
+```
+hoodie.aws.access.key
+hoodie.aws.secret.key
+hoodie.aws.session.token
+```
+If not configured, Hudi falls back to use 
[DefaultAWSCredentialsProviderChain](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/auth/DefaultAWSCredentialsProviderChain.html).
+
 ## Datasource Writer
 
 The `hudi-spark` module offers the DataSource API to write (and read) a Spark 
DataFrame into a Hudi table.
diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index 97b51a1..b3ee3fc 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -14,6 +14,7 @@ This page covers the different ways of configuring your job 
to write/read Hudi t
 - [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource 
uses a RDD based HoodieWriteClient API to actually perform writes to storage. 
These configs provide deep control over lower level aspects like file sizing, 
compression, parallelism, compaction, write schema, cleaning etc. Although Hudi 
provides sane defaults, from time-time these configs may need to be tweaked to 
optimize for specific workloads.
 - [**Metrics Configs**](#METRICS): These set of configs are used to enable 
monitoring and reporting of keyHudi stats and metrics.
 - [**Record Payload Config**](#RECORD_PAYLOAD): This is the lowest level of 
customization offered by Hudi. Record payloads define how to produce new values 
to upsert based on incoming new record and stored old record. Hudi provides 
default implementations such as OverwriteWithLatestAvroPayload which simply 
update table with the latest/last-written record. This can be overridden to a 
custom class extending HoodieRecordPayload class, on both datasource and 
WriteClient levels.
+- [**Environment Config**](#ENVIRONMENT_CONFIG): Instead of directly passing 
configurations to Hudi jobs, since 0.10.0, Hudi also supports configurations 
via a configuration file `hudi-default.conf` in which each line consists of a 
key and a value separated by whitespace or = sign.
 
 ## Spark Datasource Configs {#SPARK_DATASOURCE}
 These configs control the Hudi Spark Datasource, providing ability to define 
keys/partitioning, pick out the write operation, specify how to merge records 
or choosing query type to read.
@@ -3244,3 +3245,14 @@ Payload related configs, that can be leveraged to 
control merges based on specif
 
 ---
 
+## Environment Config {#ENVIRONMENT_CONFIG}
+Hudi supports passing configurations via a configuration file 
`hudi-default.conf` in which each line consists of a key and a value separated 
by whitespace or = sign. For example:
+```
+hoodie.datasource.hive_sync.mode               jdbc
+hoodie.datasource.hive_sync.jdbcurl            jdbc:hive2://localhost:10000
+hoodie.datasource.hive_sync.support_timestamp  false
+```
+It helps to have a central configuration file for your common cross job 
configurations/tunings, so all the jobs on your cluster can utilize it. It also 
works with Spark SQL DML/DDL, and helps avoid having to pass configs inside the 
SQL statements.
+
+By default, Hudi would load the configuration file under `/etc/hudi/conf` 
directory. You can specify a different configuration directory location by 
setting the `HUDI_CONF_DIR` environment variable.
+

[hudi] branch asf-site updated: [HUDI-2775] Add doc for external configuration and dynamodb based lock (#4087)

Reply via email to