[ 
https://issues.apache.org/jira/browse/HADOOP-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HADOOP-6605:
--------------------------------

    Attachment: hadoop-6605-3.patch

Thanks for the feedback everyone. Popping the stack, Hadoop requires the user 
set JAVA_HOME for two reasons:

# We want to add tools.jar to the classpath, and JAVA_HOME let's the user 
specify a base directory to look (other than the default java which may be from 
a JRE and therefore not have tools.jar). This is no longer an issue since 
HADOOP-7374 removed it.
# We want to respect JAVA_HOME even if there is already a java in the path. Ie 
users and admins can easily configure which java should be used with Hadoop 
that's different from the default system java. This makes sense given that 
Hadoop is picky. Therefore it makes sense to only auto-detect JAVA_HOME if it 
is not set (which all versions of the patch do) and we can determine a 
reasonable value.

On OSX, they provide an API (java_home(1)) that does this (returns a path 
suitable for setting JAVA_HOME based on enabled/preferred JVM'S as set by Java 
Preferences). I think we agree it makes sense to use this.

On Linux, there is no single API that works across distributions. Even though 
alternatives is widely available it works differently on different 
distriubtions (also, it indicates where the java binary lives, not where 
JAVA_HOME is, though you could determine that with readlink). There are 
well-known locations where JAVA_HOME is installed that you can check to 
reasonably detect it. This is the approach taken by the previous patch. I've 
provided data that shows that checking a set of directories does not measurably 
impact the execution time (therefore "too much work" sounds like a 
philosophical objection rather than a technical objection to me). I've found 
that globbing is not an issue in practice because the glob does not match more 
than one installation on a given system. This is because the JDK was resolved 
via a packaging dependency and the package updates itself rather than having 
multiple versions installed. People who manually install multiple JDKs 
typically set JAVA_HOME explicitly and therefore the detection is not used. 
There are no alternative proposals for autodetecting JAVA_HOME on Linux, and 
I'm not going to spend any more time on this part for now so I'm dropping this 
case from the patch.

In any case (ha), there is consensus on the OSX approach so let's just go with 
this for now. We can easily implement cases for other OS types in the future if 
there's an approach that's acceptable. Patch attached.

> Add JAVA_HOME detection to hadoop-config
> ----------------------------------------
>
>                 Key: HADOOP-6605
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6605
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Chad Metcalf
>            Assignee: Eli Collins
>            Priority: Minor
>             Fix For: 0.22.0
>
>         Attachments: HADOOP-6605.patch, hadoop-6605-1.patch, 
> hadoop-6605-2.patch, hadoop-6605-3.patch
>
>
> The commands that source hadoop-config.sh currently bail with an error if 
> JAVA_HOME is not set. Let's detect JAVA_HOME (from a list of locations on 
> various OS types) if JAVA_HOME is not already set by hadoop-env.sh or the 
> environment. This way users don't have to manually configure it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to