[ 
https://issues.apache.org/jira/browse/HADOOP-19838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhaobo Huang updated HADOOP-19838:
----------------------------------
    Description: 
#### Problem
Currently, Hadoop shell startup scripts only append **"HADOOP_CLIENT_OPTS"** as 
JVM arguments before the main class. This prevents users from transparently 
configuring client-side generic config (such as fs.defaultFS, 
dfs.client.socket-timeout, dfs.replication) via environment variables.

The examples provided in the doc at this link 
(https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/UnixShellGuide.html#HADOOP_CLIENT_OPTS)
 fail to take effect in actual practice.

These generic options must be parsed by GenericOptionsParser class, which only 
processes arguments passed after the main class. As a result, users have to 
manually add -D/-conf parameters to every single command execution, which is 
inconvenient and breaks the expected transparent configuration experience, for 
example:

```shell 
hadoop fs -Dfs.defaultFS=hdfs://127.0.0.1:8020/ -ls -d /
```

#### Solution
Support parsing configuration items from **"HADOOP_CLIENT_OPTS"** in the 
Configuration class. The keys used here are exactly the same as those in Hadoop 
configuration.
This patch also adds support for parsing Java system properties with a specific 
prefix: **"hadoop.property."**
Only system properties prefixed with hadoop.property. are scanned. The 
substring after the prefix is used as the actual configuration key.

#### Benefits
For AI algorithm training scenarios, tuning parameters is required to address 
training efficiency. With this feature enabled, users no longer need to modify 
any related code.

> Support parsing environment variables and system properties in the 
> Configuration class.
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-19838
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19838
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Zhaobo Huang
>            Assignee: Zhaobo Huang
>            Priority: Major
>              Labels: pull-request-available
>
> #### Problem
> Currently, Hadoop shell startup scripts only append **"HADOOP_CLIENT_OPTS"** 
> as JVM arguments before the main class. This prevents users from 
> transparently configuring client-side generic config (such as fs.defaultFS, 
> dfs.client.socket-timeout, dfs.replication) via environment variables.
> The examples provided in the doc at this link 
> (https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/UnixShellGuide.html#HADOOP_CLIENT_OPTS)
>  fail to take effect in actual practice.
> These generic options must be parsed by GenericOptionsParser class, which 
> only processes arguments passed after the main class. As a result, users have 
> to manually add -D/-conf parameters to every single command execution, which 
> is inconvenient and breaks the expected transparent configuration experience, 
> for example:
> ```shell 
> hadoop fs -Dfs.defaultFS=hdfs://127.0.0.1:8020/ -ls -d /
> ```
> #### Solution
> Support parsing configuration items from **"HADOOP_CLIENT_OPTS"** in the 
> Configuration class. The keys used here are exactly the same as those in 
> Hadoop configuration.
> This patch also adds support for parsing Java system properties with a 
> specific prefix: **"hadoop.property."**
> Only system properties prefixed with hadoop.property. are scanned. The 
> substring after the prefix is used as the actual configuration key.
> #### Benefits
> For AI algorithm training scenarios, tuning parameters is required to address 
> training efficiency. With this feature enabled, users no longer need to 
> modify any related code.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to