[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Steve Loughran (Commented) (JIRA) Sat, 31 Dec 2011 13:38:01 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178081#comment-13178081
 ]


Steve Loughran commented on HADOOP-7939:
----------------------------------------

I'm going to start by saying I couldn't get the tarball to start up. Here are 
some of the problems I hit:

 * HADOOP-7838  sbin/start-balancer doesnt
 * MAPREDUCE-3430 - Shell variable expansions in yarn shell scripts should be 
quoted
 * MAPREDUCE-3431 - NPE in Resource Manager shutdown
 * MAPREDUCE-3432 - Yarn doesn't work if JAVA_HOME isn't set

The key problem was the #of various env variables to set, something wring with 
env propagation (MAPREDUCE-3432 shows this), no "how to get up an running in 5 
minutes" documents and the fact that some shell scripts contain assumptions 
about code layout that aren't valid; HADOOP-7838 show this.

There's probably an underlying problem: no testing that the tarball works when 
deployed onto a clean OS into a directory with a space in it somewhere up the 
tree. This isn't that hard to write; a few ant tasks to <scp> the file then 
<ssh> some commands -and without it you can't be sure such problems have gone 
away and won't come back.

If I have that problem, I expect end users will, and fear for the traffic on 
hadoop-*-users. That's not just pain and suffering, it will cause people to not 
use Hadoop. As you don't pay for a free download, you haven't put enough money 
on the table to spend a day getting the thing up and running on your desktop. 
Any bad installation experience will put people off.

Tom white's goal "one single env variable" is what I'd like. Set that, have the 
others drive off it (unless over-ridden) -and work it out based on 
bin/something if it isn't predefined. 

Looking at this proposal, 

 # I like the idea of a standard layout that can be tuned, so that we have the 
option to point to different versions of things if need be, but you don't need 
to set up everything in advance.
 # You can't rely on symlinks in windows-land, which, given the recent MS 
support for Hadoo on Azure, may matter in production as well as dev. And 
remember, those Windows desktop installs probably form the majority of 
single-user deployments.
# Windows also has the hard limit of 1024 chars on command lines; is the thing 
that tops out first on long classpaths (forcing you to set the CLASSPATH env 
variable then call java, but even that has limits).
# We need some tests. I know BigTop does this, but would like some pushed up 
earlier into the process, so all HADOOP- HDFS- and MAPREDUCE- patches get 
regression tested against the scripts in their initial tests.
# Todd's points about config, tmp &c raise another point. per-user options and 
temp dirs should be in different paths from the binaries. I don't want the temp 
files on the root disk, and just because Hadoop was installed by root doesn't 
mean I shouldn't be able to run Hadoop with my own config.
# Redirectable config/tmp also makes it trivial to play with different 
installation options without editing conf files.

In an ideal world we'd also replace the bash scripts with python as it's a more 
readable/editable language, less quirky and sets things up for more 
python-round-the-edges work. I don't know enough about python on windows to 
know the consequences of that; I'd expect python to be native (not cygwin). 
I'll put that to one side for now.

For me, then
 * A root hadoop dir that has things out underneath is good.
 * I would like a way to point to my config/tmp dirs without needing to edit 
symlinks.
 * This stuff needs to work on windows too.
 * The tarball needs installation tests.





                
> Improve Hadoop subcomponent integration in Hadoop 0.23
> ------------------------------------------------------
>
>                 Key: HADOOP-7939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7939
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, conf, documentation, scripts
>    Affects Versions: 0.23.0
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 0.23.1
>
>
> h1. Introduction
> For the rest of this proposal it is assumed that the current set
> of Hadoop subcomponents is:
>  * hadoop-common
>  * hadoop-hdfs
>  * hadoop-yarn
>  * hadoop-mapreduce
> It must be noted that this is an open ended list, though. For example,
> implementations of additional frameworks on top of yarn (e.g. MPI) would
> also be considered a subcomponent.
> h1. Problem statement
> Currently there's an unfortunate coupling and hard-coding present at the
> level of launcher scripts, configuration scripts and Java implementation
> code that prevents us from treating all subcomponents of Hadoop independently
> of each other. In a lot of places it is assumed that bits and pieces
> from individual subcomponents *must* be located at predefined places
> and they can not be dynamically registered/discovered during the runtime.
> This prevents a truly flexible deployment of Hadoop 0.23. 
> h1. Proposal
> NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255. 
> The goal here is to keep as much of that layout in place as possible,
> while permitting different deployment layouts.
> The aim of this proposal is to introduce the needed level of indirection and
> flexibility in order to accommodate the current assumed layout of Hadoop 
> tarball
> deployments and all the other styles of deployments as well. To this end the
> following set of environment variables needs to be uniformly used in all of
> the subcomponent's launcher scripts, configuration scripts and Java code
> (<SC> stands for a literal name of a subcomponent). These variables are
> expected to be defined by <SC>-env.sh scripts and sourcing those files is
> expected to have the desired effect of setting the environment up correctly.
>   # HADOOP_<SC>_HOME
>    ## root of the subtree in a filesystem where a subcomponent is expected to 
> be installed 
>    ## default value: $0/..
>   # HADOOP_<SC>_JARS 
>    ## a subdirectory with all of the jar files comprising subcomponent's 
> implementation 
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
>   # HADOOP_<SC>_EXT_JARS
>    ## a subdirectory with all of the jar files needed for extended 
> functionality of the subcomponent (nonessential for correct work of the basic 
> functionality)
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
>   # HADOOP_<SC>_NATIVE_LIBS
>    ## a subdirectory with all the native libraries that component requires
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
>   # HADOOP_<SC>_BIN
>    ## a subdirectory with all of the launcher scripts specific to the client 
> side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/bin
>   # HADOOP_<SC>_SBIN
>    ## a subdirectory with all of the launcher scripts specific to the 
> server/system side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/sbin
>   # HADOOP_<SC>_LIBEXEC
>    ## a subdirectory with all of the launcher scripts that are internal to 
> the implementation and should *not* be invoked directly
>    ## default value: $(HADOOP_<SC>_HOME)/libexec
>   # HADOOP_<SC>_CONF
>    ## a subdirectory containing configuration files for a subcomponent
>    ## default value: $(HADOOP_<SC>_HOME)/conf
>   # HADOOP_<SC>_DATA
>    ## a subtree in the local filesystem for storing component's persistent 
> state
>    ## default value: $(HADOOP_<SC>_HOME)/data
>   # HADOOP_<SC>_LOG
>    ## a subdirectory for subcomponents's log files to be stored
>    ## default value: $(HADOOP_<SC>_HOME)/log
>   # HADOOP_<SC>_RUN
>    ## a subdirectory with runtime system specific information
>    ## default value: $(HADOOP_<SC>_HOME)/run
>   # HADOOP_<SC>_TMP
>    ## a subdirectory with temprorary files
>    ## default value: $(HADOOP_<SC>_HOME)/tmp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Reply via email to