This is an automated email from the ASF dual-hosted git repository.
wombatukun pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/hudi.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 8d88e78636f [HUDI-8223][DOCS] Docs update for the behavior change of
config loading (#12074)
8d88e78636f is described below
commit 8d88e78636fa169f1d5c62b28bf99c35583326f4
Author: Vova Kolmakov <[email protected]>
AuthorDate: Wed Oct 9 10:49:18 2024 +0700
[HUDI-8223][DOCS] Docs update for the behavior change of config loading
(#12074)
Co-authored-by: Vova Kolmakov <[email protected]>
---
website/docs/configurations.md | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/website/docs/configurations.md b/website/docs/configurations.md
index aac045eac1c..87f545d951d 100644
--- a/website/docs/configurations.md
+++ b/website/docs/configurations.md
@@ -20,7 +20,7 @@ hoodie.datasource.hive_sync.support_timestamp false
```
It helps to have a central configuration file for your common cross job
configurations/tunings, so all the jobs on your cluster can utilize it. It also
works with Spark SQL DML/DDL, and helps avoid having to pass configs inside the
SQL statements.
-By default, Hudi would load the configuration file under `/etc/hudi/conf`
directory. You can specify a different configuration directory location by
setting the `HUDI_CONF_DIR` environment variable.
+Hudi always loads the configuration file under default directory
`file:/etc/hudi/conf`, if exists, to set the default configs. You can specify a
different configuration directory location by setting the `HUDI_CONF_DIR`
environment variable.
- [**Spark Datasource Configs**](#SPARK_DATASOURCE): These configs control the
Hudi Spark Datasource, providing ability to define keys/partitioning, pick out
the write operation, specify how to merge records or choosing query type to
read.
- [**Flink Sql Configs**](#FLINK_SQL): These configs control the Hudi Flink
SQL source/sink connectors, providing ability to define record keys, pick out
the write operation, specify how to merge records, enable/disable asynchronous
compaction or choosing query type to read.
- [**Write Client Configs**](#WRITE_CLIENT): Internally, the Hudi datasource
uses a RDD based HoodieWriteClient API to actually perform writes to storage.
These configs provide deep control over lower level aspects like file sizing,
compression, parallelism, compaction, write schema, cleaning etc. Although Hudi
provides sane defaults, from time-time these configs may need to be tweaked to
optimize for specific workloads.
@@ -38,9 +38,10 @@ In the tables below **(N/A)** means there is no default
value set
## Externalized Config File
Instead of directly passing configuration settings to every Hudi job, you can
also centrally set them in a configuration
-file `hudi-defaults.conf`. By default, Hudi would load the configuration file
under `/etc/hudi/conf` directory. You can
-specify a different configuration directory location by setting the
`HUDI_CONF_DIR` environment variable. This can be
-useful for uniformly enforcing repeated configs (like Hive sync or write/index
tuning), across your entire data lake.
+file `hudi-defaults.conf`. Hudi always loads the configuration file under
default directory `file:/etc/hudi/conf`, if exists,
+to set the default configs. Besides, you can specify another configuration
directory location by setting the `HUDI_CONF_DIR`
+environment variable. The configs stored in `HUDI_CONF_DIR/hudi-defaults.conf`
are loaded, overriding any configs already set
+by the config file in the default directory.
## Hudi Table Config {#TABLE_CONFIG}
Basic Hudi Table configuration parameters.