[
https://issues.apache.org/jira/browse/HADOOP-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eli Collins updated HADOOP-6605:
--------------------------------
Attachment: hadoop-6605-3.patch
Thanks for the feedback everyone. Popping the stack, Hadoop requires the user
set JAVA_HOME for two reasons:
# We want to add tools.jar to the classpath, and JAVA_HOME let's the user
specify a base directory to look (other than the default java which may be from
a JRE and therefore not have tools.jar). This is no longer an issue since
HADOOP-7374 removed it.
# We want to respect JAVA_HOME even if there is already a java in the path. Ie
users and admins can easily configure which java should be used with Hadoop
that's different from the default system java. This makes sense given that
Hadoop is picky. Therefore it makes sense to only auto-detect JAVA_HOME if it
is not set (which all versions of the patch do) and we can determine a
reasonable value.
On OSX, they provide an API (java_home(1)) that does this (returns a path
suitable for setting JAVA_HOME based on enabled/preferred JVM'S as set by Java
Preferences). I think we agree it makes sense to use this.
On Linux, there is no single API that works across distributions. Even though
alternatives is widely available it works differently on different
distriubtions (also, it indicates where the java binary lives, not where
JAVA_HOME is, though you could determine that with readlink). There are
well-known locations where JAVA_HOME is installed that you can check to
reasonably detect it. This is the approach taken by the previous patch. I've
provided data that shows that checking a set of directories does not measurably
impact the execution time (therefore "too much work" sounds like a
philosophical objection rather than a technical objection to me). I've found
that globbing is not an issue in practice because the glob does not match more
than one installation on a given system. This is because the JDK was resolved
via a packaging dependency and the package updates itself rather than having
multiple versions installed. People who manually install multiple JDKs
typically set JAVA_HOME explicitly and therefore the detection is not used.
There are no alternative proposals for autodetecting JAVA_HOME on Linux, and
I'm not going to spend any more time on this part for now so I'm dropping this
case from the patch.
In any case (ha), there is consensus on the OSX approach so let's just go with
this for now. We can easily implement cases for other OS types in the future if
there's an approach that's acceptable. Patch attached.
> Add JAVA_HOME detection to hadoop-config
> ----------------------------------------
>
> Key: HADOOP-6605
> URL: https://issues.apache.org/jira/browse/HADOOP-6605
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Chad Metcalf
> Assignee: Eli Collins
> Priority: Minor
> Fix For: 0.22.0
>
> Attachments: HADOOP-6605.patch, hadoop-6605-1.patch,
> hadoop-6605-2.patch, hadoop-6605-3.patch
>
>
> The commands that source hadoop-config.sh currently bail with an error if
> JAVA_HOME is not set. Let's detect JAVA_HOME (from a list of locations on
> various OS types) if JAVA_HOME is not already set by hadoop-env.sh or the
> environment. This way users don't have to manually configure it.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira