[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Allen Wittenauer (Commented) (JIRA) Fri, 23 Dec 2011 08:24:54 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-7939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175487#comment-13175487
 ]


Allen Wittenauer commented on HADOOP-7939:
------------------------------------------

In Hadoop 0.20.2 (and previous), if one changed the shell scripts to use 
BASH_SOURCE instead of just a raw $0, one didn't need to set in an environment 
variable to run hadoop at all.  We started to make headway to fixing all these 
scripts...  But I still set HADOOP_HOME just in case something out there didn't 
work.

In Hadoop 0.20.

In Hadoop 0.20.205, HADOOP_HOME was deprecated in favor of HADOOP_PREFIX.  I'm 
not sure what the difference is between the two, but I'm sure the extra 
character makes it better some how.  In the end, I set 
HADOOP_PREFIX=$HADOOP_HOME, filled HADOOP_HOME_WARNING_SUPPRESS with various 
social commentary and was on my way.

This jira wants to introduce a cacophony of environment variables which makes 
me question what the actual value is going to be to the end user.  I suspect 
the answer is zero.  We seem to have this idea, fascination really, that the 
Hadoop components are magically separate from each other and that somehow 
people in the real world deploy different versions of these things on the same 
grid. 

(I know that one of the supposed benefits of yarn is that people could run 
different versions of MapReduce.  This reminds me of the fact that part of my 
family is from Missouri.  For as long as we have unstable and/or private 
interfaces and an RPC system that isn't forward and backward compatible, this 
just isn't a reality.)

But I still don't understand why we need environment variables to do any of 
this at all.  A decade or so ago, some bright folks got the idea that you could 
write a xyz-config program that took command line arguments that returned 
various configuration and build-time options.  Eventually this germinated into 
pkg-config, which provides a somewhat app-independent way to do the same thing. 
 While I'm not necessarily advocating full blown pkg-config support, I do think 
that adding 12 env vars per component is .... less than ideal.  When it comes 
down to it, everyone is just going to end up back at $HADOOP_HOME with various 
directories appended.


                
> Improve Hadoop subcomponent integration in Hadoop 0.23
> ------------------------------------------------------
>
>                 Key: HADOOP-7939
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7939
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: build, conf, documentation, scripts
>    Affects Versions: 0.23.0
>            Reporter: Roman Shaposhnik
>            Assignee: Roman Shaposhnik
>             Fix For: 0.23.1
>
>
> h1. Introduction
> For the rest of this proposal it is assumed that the current set
> of Hadoop subcomponents is:
>  * hadoop-common
>  * hadoop-hdfs
>  * hadoop-yarn
>  * hadoop-mapreduce
> It must be noted that this is an open ended list, though. For example,
> implementations of additional frameworks on top of yarn (e.g. MPI) would
> also be considered a subcomponent.
> h1. Problem statement
> Currently there's an unfortunate coupling and hard-coding present at the
> level of launcher scripts, configuration scripts and Java implementation
> code that prevents us from treating all subcomponents of Hadoop independently
> of each other. In a lot of places it is assumed that bits and pieces
> from individual subcomponents *must* be located at predefined places
> and they can not be dynamically registered/discovered during the runtime.
> This prevents a truly flexible deployment of Hadoop 0.23. 
> h1. Proposal
> NOTE: this is NOT a proposal for redefining the layout from HADOOP-6255. 
> The goal here is to keep as much of that layout in place as possible,
> while permitting different deployment layouts.
> The aim of this proposal is to introduce the needed level of indirection and
> flexibility in order to accommodate the current assumed layout of Hadoop 
> tarball
> deployments and all the other styles of deployments as well. To this end the
> following set of environment variables needs to be uniformly used in all of
> the subcomponent's launcher scripts, configuration scripts and Java code
> (<SC> stands for a literal name of a subcomponent). These variables are
> expected to be defined by <SC>-env.sh scripts and sourcing those files is
> expected to have the desired effect of setting the environment up correctly.
>   # HADOOP_<SC>_HOME
>    ## root of the subtree in a filesystem where a subcomponent is expected to 
> be installed 
>    ## default value: $0/..
>   # HADOOP_<SC>_JARS 
>    ## a subdirectory with all of the jar files comprising subcomponent's 
> implementation 
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)
>   # HADOOP_<SC>_EXT_JARS
>    ## a subdirectory with all of the jar files needed for extended 
> functionality of the subcomponent (nonessential for correct work of the basic 
> functionality)
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/ext
>   # HADOOP_<SC>_NATIVE_LIBS
>    ## a subdirectory with all the native libraries that component requires
>    ## default value: $(HADOOP_<SC>_HOME)/share/hadoop/$(<SC>)/native
>   # HADOOP_<SC>_BIN
>    ## a subdirectory with all of the launcher scripts specific to the client 
> side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/bin
>   # HADOOP_<SC>_SBIN
>    ## a subdirectory with all of the launcher scripts specific to the 
> server/system side of the component
>    ## default value: $(HADOOP_<SC>_HOME)/sbin
>   # HADOOP_<SC>_LIBEXEC
>    ## a subdirectory with all of the launcher scripts that are internal to 
> the implementation and should *not* be invoked directly
>    ## default value: $(HADOOP_<SC>_HOME)/libexec
>   # HADOOP_<SC>_CONF
>    ## a subdirectory containing configuration files for a subcomponent
>    ## default value: $(HADOOP_<SC>_HOME)/conf
>   # HADOOP_<SC>_DATA
>    ## a subtree in the local filesystem for storing component's persistent 
> state
>    ## default value: $(HADOOP_<SC>_HOME)/data
>   # HADOOP_<SC>_LOG
>    ## a subdirectory for subcomponents's log files to be stored
>    ## default value: $(HADOOP_<SC>_HOME)/log
>   # HADOOP_<SC>_RUN
>    ## a subdirectory with runtime system specific information
>    ## default value: $(HADOOP_<SC>_HOME)/run
>   # HADOOP_<SC>_TMP
>    ## a subdirectory with temprorary files
>    ## default value: $(HADOOP_<SC>_HOME)/tmp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-7939) Improve Hadoop subcomponent integration in Hadoop 0.23

Reply via email to