[jira] [Commented] (HDFS-2045) HADOOP_*_HOME environment variables no longer work for tar ball distributions

Eric Yang (JIRA) Tue, 07 Jun 2011 15:55:45 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13045694#comment-13045694
 ]


Eric Yang commented on HDFS-2045:
---------------------------------

HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME were results of 
splitting the source code into three different submodules.  While this works 
fine for developer to isolate each project, it makes configuration difficult 
for production use. HDFS and MAPRED run as their own uid.  The amount of 
configuration just multiples.

To solve this problem, there are a couple options:

Option 1.  Modify jar file which contains all common shell script in common jar 
file, when binary tarball is built, the common shell scripts are rearranged 
submerged into the binary tarball distribution, and completely remove 
HADOOP_*_HOME environment variables.  $HADOOP_PREFIX is the only hint 
(generated from shell script path, no need to define in the environment) to all 
hadoop programs where the bits are exactly layout.  When HDFS or MAPREDUCE is 
deployed, there is no need to deploy COMMON tarball.  To make this work for 
developers, *-config.sh should be moved to $HADOOP_PREFIX/libexec.  During the 
build process, hadoop-common-*.jar is extract for common shell scripts.  Both 
developer and binary layout are closer to each other.  (When project is 
converted to maven, this keeps hdfs/mapreduce loosely coupled and reduce 
duplicated shell scripts.)

Option 2. Preserve HADOOP_*_HOME for source code execution.  Environment driven 
layout does not work on binary tarball. Change the prefix tarball from 
hadoop-[common|mapred|hdfs]-0.23.0-SNAPSHOT to hadoop-[version] for easy 
extraction.

Option 3.  Enable HADOOP_*_HOME for binary tarball.  (Risk of crashing the 
system due to bad environment variable setup)

Option 4.  Merge hdfs/mapreduce back to the same project, but create as 
subdirectories to reduce duplicated shell scripts.

I am incline to vote for option 2.

> HADOOP_*_HOME environment variables no longer work for tar ball distributions
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-2045
>                 URL: https://issues.apache.org/jira/browse/HDFS-2045
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Aaron T. Myers
>
> It used to be that you could do the following:
> # Run `ant bin-package' in your hadoop-common checkout.
> # Set HADOOP_COMMON_HOME to the built directory of hadoop-common.
> # Run `ant bin-package' in your hadoop-hdfs checkout.
> # Set HADOOP_HDFS_HOME to the built directory of hadoop-hdfs.
> # Set PATH to have HADOOP_HDFS_HOME/bin and HADOOP_COMMON_HOME/bin on it.
> # Run `hdfs'.
> \\
> \\
> As of HDFS-1963, this no longer works since hdfs-config.sh is looking in 
> HADOOP_COMMON_HOME/bin/ for hadoop-config.sh, but it's being placed in 
> HADOOP_COMMON_HOME/libexec.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2045) HADOOP_*_HOME environment variables no longer work for tar ball distributions

Reply via email to