(hudi) branch asf-site updated: [HUDI-8223][DOCS] Docs update for the behavior change of config loading (#12074)

wombatukun Tue, 08 Oct 2024 20:49:36 -0700

This is an automated email from the ASF dual-hosted git repository.

wombatukun pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new 8d88e78636f [HUDI-8223][DOCS] Docs update for the behavior change of 
config loading (#12074)
8d88e78636f is described below

commit 8d88e78636fa169f1d5c62b28bf99c35583326f4
Author: Vova Kolmakov <[email protected]>
AuthorDate: Wed Oct 9 10:49:18 2024 +0700

    [HUDI-8223][DOCS] Docs update for the behavior change of config loading 
(#12074)
    
    Co-authored-by: Vova Kolmakov <[email protected]>
---
 website/docs/configurations.md | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index aac045eac1c..87f545d951d 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -20,7 +20,7 @@ hoodie.datasource.hive_sync.support_timestamp  false
 ```
 It helps to have a central configuration file for your common cross job 
configurations/tunings, so all the jobs on your cluster can utilize it. It also 
works with Spark SQL DML/DDL, and helps avoid having to pass configs inside the 
SQL statements.
 
-By default, Hudi would load the configuration file under `/etc/hudi/conf` 
directory. You can specify a different configuration directory location by 
setting the `HUDI_CONF_DIR` environment variable.
+Hudi always loads the configuration file under default directory 
`file:/etc/hudi/conf`, if exists, to set the default configs. You can specify a 
different configuration directory location by setting the `HUDI_CONF_DIR` 
environment variable.
 - [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the 
Hudi Spark Datasource, providing ability to define keys/partitioning, pick out 
the write operation, specify how to merge records or choosing query type to 
read.
 - [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink 
SQL source/sink connectors, providing ability to define record keys, pick out 
the write operation, specify how to merge records, enable/disable asynchronous 
compaction or choosing query type to read.
 - [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource 
uses a RDD based HoodieWriteClient API to actually perform writes to storage. 
These configs provide deep control over lower level aspects like file sizing, 
compression, parallelism, compaction, write schema, cleaning etc. Although Hudi 
provides sane defaults, from time-time these configs may need to be tweaked to 
optimize for specific workloads.
@@ -38,9 +38,10 @@ In the tables below **(N/A)** means there is no default 
value set
 
 ## Externalized Config File
 Instead of directly passing configuration settings to every Hudi job, you can 
also centrally set them in a configuration
-file `hudi-defaults.conf`. By default, Hudi would load the configuration file 
under `/etc/hudi/conf` directory. You can
-specify a different configuration directory location by setting the 
`HUDI_CONF_DIR` environment variable. This can be
-useful for uniformly enforcing repeated configs (like Hive sync or write/index 
tuning), across your entire data lake.
+file `hudi-defaults.conf`. Hudi always loads the configuration file under 
default directory `file:/etc/hudi/conf`, if exists,
+to set the default configs. Besides, you can specify another configuration 
directory location by setting the `HUDI_CONF_DIR` 
+environment variable. The configs stored in `HUDI_CONF_DIR/hudi-defaults.conf` 
are loaded, overriding any configs already set
+by the config file in the default directory.
 
 ## Hudi Table Config {#TABLE_CONFIG}
 Basic Hudi Table configuration parameters.

(hudi) branch asf-site updated: [HUDI-8223][DOCS] Docs update for the behavior change of config loading (#12074)

Reply via email to