HADOOP-13360. Documentation for HADOOP_subcommand_OPTS

Signed-off-by: Allen Wittenauer <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/hadoop/repo
Commit: http://git-wip-us.apache.org/repos/asf/hadoop/commit/5b7a3df7
Tree: http://git-wip-us.apache.org/repos/asf/hadoop/tree/5b7a3df7
Diff: http://git-wip-us.apache.org/repos/asf/hadoop/diff/5b7a3df7

Branch: refs/heads/HADOOP-13341
Commit: 5b7a3df75c6fc78793bf8638f3ddaa11f7e0658e
Parents: dc6d490
Author: Allen Wittenauer <[email protected]>
Authored: Wed Aug 31 07:39:34 2016 -0700
Committer: Allen Wittenauer <[email protected]>
Committed: Thu Sep 8 07:57:19 2016 -0700

----------------------------------------------------------------------
 .../src/site/markdown/ClusterSetup.md           | 19 ++++-------
 .../src/site/markdown/UnixShellGuide.md         | 34 +++++++++++++++++---
 .../src/site/markdown/HdfsNfsGateway.md         |  2 +-
 3 files changed, 37 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/hadoop/blob/5b7a3df7/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md
index 0d551b1..f222769 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/ClusterSetup.md
@@ -64,17 +64,17 @@ Administrators can configure individual daemons using the 
configuration options
 
 | Daemon | Environment Variable |
 |:---- |:---- |
-| NameNode | HADOOP\_NAMENODE\_OPTS |
-| DataNode | HADOOP\_DATANODE\_OPTS |
-| Secondary NameNode | HADOOP\_SECONDARYNAMENODE\_OPTS |
+| NameNode | HDFS\_NAMENODE\_OPTS |
+| DataNode | HDFS\_DATANODE\_OPTS |
+| Secondary NameNode | HDFS\_SECONDARYNAMENODE\_OPTS |
 | ResourceManager | YARN\_RESOURCEMANAGER\_OPTS |
 | NodeManager | YARN\_NODEMANAGER\_OPTS |
 | WebAppProxy | YARN\_PROXYSERVER\_OPTS |
-| Map Reduce Job History Server | HADOOP\_JOB\_HISTORYSERVER\_OPTS |
+| Map Reduce Job History Server | MAPRED\_HISTORYSERVER\_OPTS |
 
-For example, To configure Namenode to use parallelGC, the following statement 
should be added in hadoop-env.sh :
+For example, To configure Namenode to use parallelGC and a 4GB Java Heap, the 
following statement should be added in hadoop-env.sh :
 
-      export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC"
+      export HDFS_NAMENODE_OPTS="-XX:+UseParallelGC -Xmx4g"
 
 See `etc/hadoop/hadoop-env.sh` for other examples.
 
@@ -91,13 +91,6 @@ It is also traditional to configure `HADOOP_HOME` in the 
system-wide shell envir
       HADOOP_HOME=/path/to/hadoop
       export HADOOP_HOME
 
-| Daemon | Environment Variable |
-|:---- |:---- |
-| ResourceManager | YARN\_RESOURCEMANAGER\_HEAPSIZE |
-| NodeManager | YARN\_NODEMANAGER\_HEAPSIZE |
-| WebAppProxy | YARN\_PROXYSERVER\_HEAPSIZE |
-| Map Reduce Job History Server | HADOOP\_JOB\_HISTORYSERVER\_HEAPSIZE |
-
 ### Configuring the Hadoop Daemons
 
 This section deals with important parameters to be specified in the given 
configuration files:

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5b7a3df7/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
----------------------------------------------------------------------
diff --git 
a/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md 
b/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
index 940627d..b130f0f 100644
--- a/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/UnixShellGuide.md
@@ -24,7 +24,7 @@ Apache Hadoop has many environment variables that control 
various aspects of the
 
 ### `HADOOP_CLIENT_OPTS`
 
-This environment variable is used for almost all end-user operations.  It can 
be used to set any Java options as well as any Apache Hadoop options via a 
system property definition. For example:
+This environment variable is used for all end-user, non-daemon operations.  It 
can be used to set any Java options as well as any Apache Hadoop options via a 
system property definition. For example:
 
 ```bash
 HADOOP_CLIENT_OPTS="-Xmx1g -Dhadoop.socks.server=localhost:4000" hadoop fs -ls 
/tmp
@@ -32,6 +32,18 @@ HADOOP_CLIENT_OPTS="-Xmx1g 
-Dhadoop.socks.server=localhost:4000" hadoop fs -ls /
 
 will increase the memory and send this command via a SOCKS proxy server.
 
+### `(command)_(subcommand)_OPTS`
+
+It is also possible to set options on a per subcommand basis.  This allows for 
one to create special options for particular cases.  The first part of the 
pattern is the command being used, but all uppercase.  The second part of the 
command is the subcommand being used.  Then finally followed by the string 
`_OPT`.
+
+For example, to configure `mapred distcp` to use a 2GB heap, one would use:
+
+```bash
+MAPRED_DISTCP_OPTS="-Xmx2g"
+```
+
+These options will appear *after* `HADOOP_CLIENT_OPTS` during execution and 
will generally take precedence.
+
 ### `HADOOP_CLASSPATH`
 
   NOTE: Site-wide settings should be configured via a shellprofile entry and 
permanent user-wide settings should be configured via ${HOME}/.hadooprc using 
the `hadoop_add_classpath` function. See below for more information.
@@ -56,6 +68,8 @@ For example:
 #
 
 HADOOP_CLIENT_OPTS="-Xmx1g"
+MAPRED_DISTCP_OPTS="-Xmx2g"
+HADOOP_DISTCP_OPTS="-Xmx2g"
 ```
 
 The `.hadoop-env` file can also be used to extend functionality and teach 
Apache Hadoop new tricks.  For example, to run hadoop commands accessing the 
server referenced in the environment variable `${HADOOP_SERVER}`, the following 
in the `.hadoop-env` will do just that:
@@ -71,11 +85,23 @@ One word of warning:  not all of Unix Shell API routines 
are available or work c
 
 ## Administrator Environment
 
-There are many environment variables that impact how the system operates.  By 
far, the most important are the series of `_OPTS` variables that control how 
daemons work.  These variables should contain all of the relevant settings for 
those daemons.
+In addition to the various XML files, there are two key capabilities for 
administrators to configure Apache Hadoop when using the Unix Shell:
+
+  * Many environment variables that impact how the system operates.  This 
guide will only highlight some key ones.  There is generally more information 
in the various `*-env.sh` files.
+
+  * Supplement or do some platform-specific changes to the existing scripts.  
Apache Hadoop provides the capabilities to do function overrides so that the 
existing code base may be changed in place without all of that work.  Replacing 
functions is covered later under the Shell API documentation.
+
+### `(command)_(subcommand)_OPTS`
+
+By far, the most important are the series of `_OPTS` variables that control 
how daemons work.  These variables should contain all of the relevant settings 
for those daemons.
+
+Similar to the user commands above, all daemons will honor the 
`(command)_(subcommand)_OPTS` pattern.  It is generally recommended that these 
be set in `hadoop-env.sh` to guarantee that the system will know which settings 
it should use on restart.  Unlike user-facing subcommands, daemons will *NOT* 
honor `HADOOP_CLIENT_OPTS`.
+
+In addition, daemons that run in an extra security mode also support 
`(command)_(subcommand)_SECURE_EXTRA_OPTS`.  These options are *supplemental* 
to the generic `*_OPTS` and will appear after, therefore generally taking 
precedence.
 
-More, detailed information is contained in `hadoop-env.sh` and the other 
env.sh files.
+### `(command)_(subcommand)_USER`
 
-Advanced administrators may wish to supplement or do some platform-specific 
fixes to the existing scripts.  In some systems, this means copying the errant 
script or creating a custom build with these changes.  Apache Hadoop provides 
the capabilities to do function overrides so that the existing code base may be 
changed in place without all of that work.  Replacing functions is covered 
later under the Shell API documentation.
+Apache Hadoop provides a way to do a user check per-subcommand.  While this 
method is easily circumvented and should not be considered a security-feature, 
it does provide a mechanism by which to prevent accidents.  For example, 
setting `HDFS_NAMENODE_USER=hdfs` will make the `hdfs namenode` and `hdfs 
--daemon start namenode` commands verify that the user running the commands are 
the hdfs user by checking the `USER` environment variable.  This also works for 
non-daemons.  Setting `HADOOP_DISTCP_USER=jane` will verify that `USER` is set 
to `jane` before being allowed to execute the `hadoop distcp` command.
 
 ## Developer and Advanced Administrator Environment
 

http://git-wip-us.apache.org/repos/asf/hadoop/blob/5b7a3df7/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
----------------------------------------------------------------------
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md 
b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
index 6731189..4742637 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsNfsGateway.md
@@ -183,7 +183,7 @@ It's strongly recommended for the users to update a few 
configuration properties
         </property>
 
 *   JVM and log settings. You can export JVM settings (e.g., heap size and GC 
log) in
-    HADOOP\_NFS3\_OPTS. More NFS related settings can be found in 
hadoop-env.sh.
+    HDFS\_NFS3\_OPTS. More NFS related settings can be found in hadoop-env.sh.
     To get NFS debug trace, you can edit the log4j.property file
     to add the following. Note, debug trace, especially for ONCRPC, can be 
very verbose.
 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to