[ https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer updated HADOOP-9902: ------------------------------------- Release Note: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid, out, etc files for secure daemons have been renamed to include the appropriate ${HADOOP_IDENT_STR}. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. * All Hadoop shell script subsystems execute hadoop-env.sh, which allows for all of the environment variables to be in one location. This was not the case previously. * The default content of *-env.sh has been significantly alterated, with the majority of defaults moved into more protected areas. * All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'yarn', 'mapred' and related commands are executed. Previously, these were separated out which meant a significant amount of duplication of common settings. * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec and sbin. The sbin versions have been removed. * The log4j settings forcibly set by some *-daemon.sh commands have been removed. These settings are now configurable in the *-env.sh files, in particular via *_OPT. * Support for various undocumentented YARN log4j.properties files has been removed. * Support for $HADOOP_MASTER and the related rsync code have been removed. * yarn.id.str has been removed. * We now require bash v3 (released July 27, 2004) or better in order to take advantage of better regex handling and ${BASH_SOURCE}. POSIX sh will not work. * Support for --script has been removed. We now use ${HADOOP_*_PATH} or ${HADOOP_PREFIX} to find the necessary binaries. (See other note regarding ${HADOOP_PREFIX} auto discovery.) * Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be ignored and stripped from their respective environment settings. BUG FIXES: * ${HADOOP_CONF_DIR} is now properly honored everywhere. * Documented hadoop-layout.sh with a provided hadoop-layout.sh.example file. * Shell commands should now work properly when called as a relative path and without HADOOP_PREFIX being defined. If ${HADOOP_PREFIX} is not set, it will be automatically determined based upon the current location of the shell library. Note that other parts of the ecosystem may require this environment variable to be configured. * Operations which trigger ssh will now limit the number of connections to run in parallel to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion. By default, this is set to 10. * ${HADOOP_CLIENT_OPTS} support has been added to a few more commands. * Various options on hadoop command lines were supported inconsistently. These have been unified into hadoop-config.sh. --config still needs to come first, however. * ulimit logging for secure daemons no longer assumes /bin/bash but does assume bash on the command line path. * Removed references to some Yahoo! specific paths. * Removed unused slaves.sh from YARN build tree. IMPROVEMENTS: * Significant amounts of redundant code have been moved into a new file called hadoop-functions.sh. * Improved information in the default *-env.sh on what can be set, ramifications of setting, etc. * There is an attempt to do some trivial deduplication and sanitization of the classpath and JVM options. This allows, amongst other things, for custom settings in *_OPTS for Hadoop daemons to override defaults and other generic settings (i.e., $HADOOP_OPTS). This is particularly relevant for Xmx settings, as one can now set them in _OPTS and ignore the heap specific options for daemons which force the size in megabytes. * Operations which trigger ssh connections can now use pdsh if installed. $HADOOP_SSH_OPTS still gets applied. * Subcommands have been alphabetized in both usage and in the code. * All/most of the functionality provided by the sbin/* commands has been moved to either their bin/ equivalents or made into functions. The rewritten versions of these commands are now wrappers to maintain backward compatibility. Of particular note is the new --daemon option present in some bin/ commands which allow certain subcommands to be daemonized. * It is now possible to override some of the shell code capabilities to provide site specific functionality. * A new option called --buildpaths will attempt to add developer build directories to the classpath to allow for in source tree testing. * If a usage function is defined, the following will trigger a help message if it is given in the option path to the shell script: --? -? ? --help -help -h help * Several generic environment variables have been added to provide a common configuration for pids, logs, and their security equivalents. The older versions still act as overrides to these generic versions. * Groundwork has been laid to allow for custom secure daemon setup using something other than jsvc. * Added distch and jnipath subcommands to hadoop command. was: The Hadoop shell scripts have been rewritten to fix many long standing bugs and include some new features. While an eye has been kept towards compatibility, some changes may break existing installations. INCOMPATIBLE CHANGES: * The pid files for secure daemons have been renamed to include the appropriate $HADOOP_IDENT_STR. This should allow, with proper configurations in place, for multiple versions of the same secure daemon to run on a host. * All YARN_* and MAPRED_* environment variables act as overrides to their equivalent HADOOP_* environment variables when 'yarn', 'mapred' and related commands are executed. Previously, these were separated which meant duplication of common settings. * All Hadoop shell script subsystems execute hadoop-env.sh, which allows for all of the environment variables to be in one location. This was not the case previously. * hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated placed into libexec and sbin during install. The sbin version has been removed. * The log4j settings forcibly set by some *-daemon.sh commands have been removed. This is now configurable in the *-env.sh files. Users who do not have these set will see logs going in odd places. * Support for various undocumentented YARN log4j.properties files has been removed. * Support for $HADOOP_MASTER and the related rsync code have been removed. * yarn.id.str has been removed. * We now require bash v3 (released July 27, 2004) or better in order to take advantage of better regex handling. * Support for --script has been removed from the sbin commands. We now use $HADOOP_*_PATH or $HADOOP_PREFIX to find the necessary binaries. BUG FIXES: * HADOOP_CONF_DIR is now properly honored everywhere. * Documented hadoop-layout.sh. * Added better comments to *-env.sh. * Shell commands should now work properly when called as a relative path and without HADOOP_PREFIX. If HADOOP_PREFIX is not set, it will be automatically determined based upon the current location of the shell command. Note that other parts of the ecosystem may require this environment variable to be configured. * Operations which trigger ssh will now limit how many connections run in parallel to 10 to prevent memory and network exhaustion. * HADOOP_CLIENT_OPTS support has been added to a few more commands. * Various options on hadoop command lines were supported inconsistently. These have been unified into hadoop-config.sh. --config still needs to come first, however. * ulimit logging for secure daemons no longer assumes /bin/bash but does assume bash on the command line path. * Removed references to some Yahoo! specific paths. * Removed unused slaves.sh from YARN build tree. IMPROVEMENTS: * Significant amounts of redundant code have been moved into a new file called hadoop-functions.sh. * Improved information in *-env.sh on what can be set, ramifications of setting, etc. * There is an attempt to do some trivial deduplication of the classpath and JVM options. This allows, amongst other things, for custom settings in *_OPTS for Hadoop daemons to override defaults and other generic settings (i.e., $HADOOP_OPTS). This is particularly relevant for Xmx settings, as one can now set them in _OPTS and ignore the heap specific options for daemons which force the size in megabytes. * Operations which trigger ssh connections can now use pdsh if installed. $HADOOP_SSH_OPTS still gets applied. * Subcommands have been alphabetized in both usage and in the code. * All/most of the functionality provided by the sbin/* commands has been moved to either their bin/ equivalents or made into functions. The rewritten versions of these commands are now wrappers to maintain backward compatibility. Of particular note is the new --daemon option present in some bin/ commands which allow certain subcommands to be daemonized. * It is now possible to override some of the shell code capabilities to provide site specific functionality. * A new option called --buildpaths will attempt to add developer build directories to the classpath to allow for in source tree testing. * If a usage function is defined, the following will trigger a help message if it is given in the option path to the shell script: --? -? ? --help -help -h help * Several generic environment variables have been added to provide a common configuration for pids, logs, and their security equivalents. The older versions still act as overrides to these generic versions. * Groundwork has been laid to allow for custom secure daemon setup using something other than jsvc. * Added distch subcommand to hadoop command. > Shell script rewrite > -------------------- > > Key: HADOOP-9902 > URL: https://issues.apache.org/jira/browse/HADOOP-9902 > Project: Hadoop Common > Issue Type: Improvement > Components: scripts > Affects Versions: 3.0.0 > Reporter: Allen Wittenauer > Assignee: Allen Wittenauer > Labels: releasenotes > Attachments: HADOOP-9902-2.patch, HADOOP-9902-3.patch, > HADOOP-9902-4.patch, HADOOP-9902-5.patch, HADOOP-9902.patch, HADOOP-9902.txt, > hadoop-9902-1.patch, more-info.txt > > > Umbrella JIRA for shell script rewrite. See more-info.txt for more details. -- This message was sent by Atlassian JIRA (v6.2#6252)