[jira] [Updated] (HADOOP-9902) Shell script rewrite

Allen Wittenauer (JIRA) Thu, 03 Sep 2015 16:06:04 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-9902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Allen Wittenauer updated HADOOP-9902:
-------------------------------------
    Release Note: 
<!-- markdown -->
The Hadoop shell scripts have been rewritten to fix many long standing bugs and 
include some new features.  While an eye has been kept towards compatibility, 
some changes may break existing installations.

INCOMPATIBLE CHANGES:

* The pid and out files for secure daemons have been renamed to include the 
appropriate ${HADOOP_IDENT_STR}.  This should allow, with proper configurations 
in place, for multiple versions of the same secure daemon to run on a host. 
Additionally, pid files are now created when daemons are run in interactive 
mode.  This will also prevent the accidental starting of two daemons with the 
same configuration prior to launching java (i.e., "fast fail" without having to 
wait for socket opening).
* All Hadoop shell script subsystems now execute hadoop-env.sh, which allows 
for all of the environment variables to be in one location.  This was not the 
case previously.
* The default content of *-env.sh has been significantly altered, with the 
majority of defaults moved into more protected areas inside the code. 
Additionally, these files do not auto-append anymore; setting a variable on the 
command line prior to calling a shell command must contain the entire content, 
not just any extra settings.  This brings Hadoop more in-line with the vast 
majority of other software packages.
* All HDFS_*, YARN_*, and MAPRED_* environment variables act as overrides to 
their equivalent HADOOP_* environment variables when 'hdfs', 'yarn', 'mapred', 
and related commands are executed. Previously, these were separated out which 
meant a significant amount of duplication of common settings.  
* hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec 
and sbin.  The sbin versions have been removed.
* The log4j settings forcibly set by some *-daemon.sh commands have been 
removed.  These settings are now configurable in the *-env.sh files via *_OPT. 
* Support for various undocumented YARN log4j.properties files has been removed.
* Support for ${HADOOP_MASTER} and the related rsync code have been removed.
* The undocumented and unused yarn.id.str Java property has been removed.
* The unused yarn.policy.file Java property has been removed.
* We now require bash v3 (released July 27, 2004) or better in order to take 
advantage of better regex handling and ${BASH_SOURCE}.  POSIX sh will not work.
* Support for --script has been removed. We now use ${HADOOP_*_PATH} or 
${HADOOP_PREFIX} to find the necessary binaries.  (See other note regarding 
${HADOOP_PREFIX} auto discovery.)
* Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be 
ignored and stripped from their respective environment settings.

NEW FEATURES:

* Daemonization has been moved from *-daemon.sh to the bin commands via the 
--daemon option. Simply use --daemon start to start a daemon, --daemon stop to 
stop a daemon, and --daemon status to set $? to the daemon's status.  The 
return code for status is LSB-compatible.  For example, 'hdfs --daemon start 
namenode'.
* It is now possible to override some of the shell code capabilities to provide 
site specific functionality without replacing the shipped versions.  
Replacement functions should go into the new hadoop-user-functions.sh file.
* A new option called --buildpaths will attempt to add developer build 
directories to the classpath to allow for in source tree testing.
* Operations which trigger ssh connections can now use pdsh if installed.  
${HADOOP_SSH_OPTS} still gets applied. 
* Added distch and jnipath subcommands to the hadoop command.
* Shell scripts now support a --debug option which will report basic 
information on the construction of various environment variables, java options, 
classpath, etc. to help in configuration debugging.

BUG FIXES:

* ${HADOOP_CONF_DIR} is now properly honored everywhere, without requiring 
symlinking and other such tricks.
* ${HADOOP_CONF_DIR}/hadoop-layout.sh is now documented with a provided 
hadoop-layout.sh.example file.
* Shell commands should now work properly when called as a relative path, 
without ${HADOOP_PREFIX} being defined, and as the target of bash -x for 
debugging. If ${HADOOP_PREFIX} is not set, it will be automatically determined 
based upon the current location of the shell library.  Note that other parts of 
the extended Hadoop ecosystem may still require this environment variable to be 
configured.
* Operations which trigger ssh will now limit the number of connections to run 
in parallel to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion. 
 By default, this is set to 10.
* ${HADOOP_CLIENT_OPTS} support has been added to a few more commands.
* Some subcommands were not listed in the usage.
* Various options on hadoop command lines were supported inconsistently.  These 
have been unified into hadoop-config.sh. --config is still required to be 
first, however.
* ulimit logging for secure daemons no longer assumes /bin/bash but does assume 
bash is on the command line path.
* Removed references to some Yahoo! specific paths.
* Removed unused slaves.sh from YARN build tree.
* Many exit states have been changed to reflect reality.
* Shell level errors now go to STDERR.  Before, many of them went incorrectly 
to STDOUT.
* CDPATH with a period (.) should no longer break the scripts.
* The scripts no longer try to chown directories.
* If ${JAVA_HOME} is not set on OS X, it now properly detects it instead of 
throwing an error.

IMPROVEMENTS:

* The *.out files are now appended instead of overwritten to allow for external 
log rotation.
* The style and layout of the scripts is much more consistent across 
subprojects.  
* More of the shell code is now commented.
* Significant amounts of redundant code have been moved into a new file called 
hadoop-functions.sh.
* The various *-env.sh have been massively changed to include documentation and 
examples on what can be set, ramifications of setting, etc.  for all variables 
that are expected to be set by a user.  
* There is now some trivial de-duplication and sanitization of the classpath 
and JVM options.  This allows, amongst other things, for custom settings in 
*_OPTS for Hadoop daemons to override defaults and other generic settings 
(i.e., ${HADOOP_OPTS}).  This is particularly relevant for Xmx settings, as one 
can now set them in _OPTS and ignore the heap specific options for daemons 
which force the size in megabytes.
* Subcommands have been alphabetized in both usage and in the code.
* All/most of the functionality provided by the sbin/* commands has been moved 
to either their bin/ equivalents or made into functions.  The rewritten 
versions of these commands are now wrappers to maintain backward compatibility.
* Usage information is given with the following options/subcommands for all 
scripts using the common framework: --? -? ? --help -help -h help 
* Several generic environment variables have been added to provide a common 
configuration for pids, logs, and their security equivalents.  The older 
versions still act as overrides to these generic versions.
* Groundwork has been laid to allow for custom secure daemon setup using 
something other than jsvc (e.g., pfexec on Solaris).
* Scripts now test and report better error messages for various states of the 
log and pid dirs on daemon startup.  Before, unprotected shell errors would be 
displayed to the user.

  was:
The Hadoop shell scripts have been rewritten to fix many long standing bugs and 
include some new features.  While an eye has been kept towards compatibility, 
some changes may break existing installations.

INCOMPATIBLE CHANGES:

* The pid and out files for secure daemons have been renamed to include the 
appropriate ${HADOOP_IDENT_STR}.  This should allow, with proper configurations 
in place, for multiple versions of the same secure daemon to run on a host. 
Additionally, pid files are now created when daemons are run in interactive 
mode.  This will also prevent the accidental starting of two daemons with the 
same configuration prior to launching java (i.e., "fast fail" without having to 
wait for socket opening).
* All Hadoop shell script subsystems now execute hadoop-env.sh, which allows 
for all of the environment variables to be in one location.  This was not the 
case previously.
* The default content of *-env.sh has been significantly altered, with the 
majority of defaults moved into more protected areas inside the code. 
Additionally, these files do not auto-append anymore; setting a variable on the 
command line prior to calling a shell command must contain the entire content, 
not just any extra settings.  This brings Hadoop more in-line with the vast 
majority of other software packages.
* All HDFS_*, YARN_*, and MAPRED_* environment variables act as overrides to 
their equivalent HADOOP_* environment variables when 'hdfs', 'yarn', 'mapred', 
and related commands are executed. Previously, these were separated out which 
meant a significant amount of duplication of common settings.  
* hdfs-config.sh and hdfs-config.cmd were inadvertently duplicated into libexec 
and sbin.  The sbin versions have been removed.
* The log4j settings forcibly set by some *-daemon.sh commands have been 
removed.  These settings are now configurable in the *-env.sh files via *_OPT. 
* Support for various undocumented YARN log4j.properties files has been removed.
* Support for ${HADOOP_MASTER} and the related rsync code have been removed.
* The undocumented and unused yarn.id.str Java property has been removed.
* The unused yarn.policy.file Java property has been removed.
* We now require bash v3 (released July 27, 2004) or better in order to take 
advantage of better regex handling and ${BASH_SOURCE}.  POSIX sh will not work.
* Support for --script has been removed. We now use ${HADOOP_*_PATH} or 
${HADOOP_PREFIX} to find the necessary binaries.  (See other note regarding 
${HADOOP_PREFIX} auto discovery.)
* Non-existent classpaths, ld.so library paths, JNI library paths, etc, will be 
ignored and stripped from their respective environment settings.

NEW FEATURES:

* Daemonization has been moved from *-daemon.sh to the bin commands via the 
--daemon option. Simply use --daemon start to start a daemon, --daemon stop to 
stop a daemon, and --daemon status to set $? to the daemon's status.  The 
return code for status is LSB-compatible.  For example, 'hdfs --daemon start 
namenode'.
* It is now possible to override some of the shell code capabilities to provide 
site specific functionality without replacing the shipped versions.  
Replacement functions should go into the new hadoop-user-functions.sh file.
* A new option called --buildpaths will attempt to add developer build 
directories to the classpath to allow for in source tree testing.
* Operations which trigger ssh connections can now use pdsh if installed.  
${HADOOP_SSH_OPTS} still gets applied. 
* Added distch and jnipath subcommands to the hadoop command.
* Shell scripts now support a --debug option which will report basic 
information on the construction of various environment variables, java options, 
classpath, etc. to help in configuration debugging.

BUG FIXES:

* ${HADOOP_CONF_DIR} is now properly honored everywhere, without requiring 
symlinking and other such tricks.
* ${HADOOP_CONF_DIR}/hadoop-layout.sh is now documented with a provided 
hadoop-layout.sh.example file.
* Shell commands should now work properly when called as a relative path, 
without ${HADOOP_PREFIX} being defined, and as the target of bash -x for 
debugging. If ${HADOOP_PREFIX} is not set, it will be automatically determined 
based upon the current location of the shell library.  Note that other parts of 
the extended Hadoop ecosystem may still require this environment variable to be 
configured.
* Operations which trigger ssh will now limit the number of connections to run 
in parallel to ${HADOOP_SSH_PARALLEL} to prevent memory and network exhaustion. 
 By default, this is set to 10.
* ${HADOOP_CLIENT_OPTS} support has been added to a few more commands.
* Some subcommands were not listed in the usage.
* Various options on hadoop command lines were supported inconsistently.  These 
have been unified into hadoop-config.sh. --config is still required to be 
first, however.
* ulimit logging for secure daemons no longer assumes /bin/bash but does assume 
bash is on the command line path.
* Removed references to some Yahoo! specific paths.
* Removed unused slaves.sh from YARN build tree.
* Many exit states have been changed to reflect reality.
* Shell level errors now go to STDERR.  Before, many of them went incorrectly 
to STDOUT.
* CDPATH with a period (.) should no longer break the scripts.
* The scripts no longer try to chown directories.
* If ${JAVA_HOME} is not set on OS X, it now properly detects it instead of 
throwing an error.

IMPROVEMENTS:

* The *.out files are now appended instead of overwritten to allow for external 
log rotation.
* The style and layout of the scripts is much more consistent across 
subprojects.  
* More of the shell code is now commented.
* Significant amounts of redundant code have been moved into a new file called 
hadoop-functions.sh.
* The various *-env.sh have been massively changed to include documentation and 
examples on what can be set, ramifications of setting, etc.  for all variables 
that are expected to be set by a user.  
* There is now some trivial de-duplication and sanitization of the classpath 
and JVM options.  This allows, amongst other things, for custom settings in 
*_OPTS for Hadoop daemons to override defaults and other generic settings 
(i.e., ${HADOOP_OPTS}).  This is particularly relevant for Xmx settings, as one 
can now set them in _OPTS and ignore the heap specific options for daemons 
which force the size in megabytes.
* Subcommands have been alphabetized in both usage and in the code.
* All/most of the functionality provided by the sbin/* commands has been moved 
to either their bin/ equivalents or made into functions.  The rewritten 
versions of these commands are now wrappers to maintain backward compatibility.
* Usage information is given with the following options/subcommands for all 
scripts using the common framework: --? -? ? --help -help -h help 
* Several generic environment variables have been added to provide a common 
configuration for pids, logs, and their security equivalents.  The older 
versions still act as overrides to these generic versions.
* Groundwork has been laid to allow for custom secure daemon setup using 
something other than jsvc (e.g., pfexec on Solaris).
* Scripts now test and report better error messages for various states of the 
log and pid dirs on daemon startup.  Before, unprotected shell errors would be 
displayed to the user.


> Shell script rewrite
> --------------------
>
>                 Key: HADOOP-9902
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9902
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: scripts
>    Affects Versions: 3.0.0
>            Reporter: Allen Wittenauer
>            Assignee: Allen Wittenauer
>              Labels: releasenotes
>             Fix For: 3.0.0
>
>         Attachments: HADOOP-9902-10.patch, HADOOP-9902-11.patch, 
> HADOOP-9902-12.patch, HADOOP-9902-13-branch-2.patch, HADOOP-9902-13.patch, 
> HADOOP-9902-14.patch, HADOOP-9902-15.patch, HADOOP-9902-16.patch, 
> HADOOP-9902-2.patch, HADOOP-9902-3.patch, HADOOP-9902-4.patch, 
> HADOOP-9902-5.patch, HADOOP-9902-6.patch, HADOOP-9902-7.patch, 
> HADOOP-9902-8.patch, HADOOP-9902-9.patch, HADOOP-9902.patch, HADOOP-9902.txt, 
> hadoop-9902-1.patch, more-info.txt
>
>
> Umbrella JIRA for shell script rewrite.  See more-info.txt for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HADOOP-9902) Shell script rewrite

Reply via email to