[
https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235814#comment-13235814
]
Roman Shaposhnik commented on MAHOUT-994:
-----------------------------------------
There's no need to add $HADOOP_HOME/lib to the classpath as long as you're
using hadoop itself as a launcher script. In fact, it is dangerous to assume
that there's $HADOOP_HOME/lib to begin with. This is definitely not the case
with Hadoop 0.23 and might not be the case with Hadoop 1.X
Explicit setting of HADOOP_CONF_DIR needs to be dropped from Pig script and I
will not recommend keeping it in Mahout. Once again -- using hadoop launcher
script itself is the best way to add these things to the proper places in th
CP. If you try to do it explicitly you're essentially second guessing hadoop
implementation and your scripts become brittle.
'which' is definitely quite reliable. In fact, if you look at Pig, Hive and
HBase they have all transitioned to using this technique to get the Hadoop user
wants. The only difference is defaults (some projects prefer hadoop from the
PATH to override anything else, some prefer it the other way). Here's another
example of how HBase does it (scroll to line #219):
http://svn.apache.org/viewvc/hbase/trunk/bin/hbase?revision=1296661&view=markup
One again -- if you try to deduce the libraries yourself -- your scripts become
brittle. You can *only* rely on hadoop implementation to set these things up
for you. This is an uncomfortable truth of at least 3 different incompatible
Hadoop implementations being actively deployed in the wild.
Does it make sense?
> mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all
> major Hadoop branches
> --------------------------------------------------------------------------------------------------
>
> Key: MAHOUT-994
> URL: https://issues.apache.org/jira/browse/MAHOUT-994
> Project: Mahout
> Issue Type: Bug
> Components: Integration
> Affects Versions: 0.6
> Reporter: Roman Shaposhnik
>
> Mahout should follow the Pig and Hive example and not rely explicitly on
> HADOOP_HOME and HADOOP_CONF_DIR
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira