[ 
https://issues.apache.org/jira/browse/MAHOUT-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235814#comment-13235814
 ] 

Roman Shaposhnik commented on MAHOUT-994:
-----------------------------------------

There's no need to add $HADOOP_HOME/lib to the classpath as long as you're 
using hadoop itself as a launcher script. In fact, it is dangerous to assume 
that there's $HADOOP_HOME/lib to begin with. This is definitely not the case 
with Hadoop 0.23 and might not be the case with Hadoop 1.X

Explicit setting of HADOOP_CONF_DIR needs to be dropped from Pig script and I 
will not recommend keeping it in Mahout. Once again -- using hadoop launcher 
script itself is the best way to add these things to the proper places in th 
CP. If you try to do it explicitly you're essentially second guessing hadoop 
implementation and your scripts become brittle.

'which' is definitely quite reliable. In fact, if you look at Pig, Hive and 
HBase they have all transitioned to using this technique to get the Hadoop user 
wants. The only difference is defaults (some projects prefer hadoop from the 
PATH to override anything else, some prefer it the other way). Here's another 
example of how HBase does it (scroll to line #219):
  
http://svn.apache.org/viewvc/hbase/trunk/bin/hbase?revision=1296661&view=markup

One again -- if you try to deduce the libraries yourself -- your scripts become 
brittle. You can *only* rely on hadoop implementation to set these things up 
for you. This is an uncomfortable truth of at least 3 different incompatible 
Hadoop implementations being actively deployed in the wild.

Does it make sense?

                
> mahout script shouldn't rely on HADOOP_HOME since that was deprecated in all 
> major Hadoop branches
> --------------------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-994
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-994
>             Project: Mahout
>          Issue Type: Bug
>          Components: Integration
>    Affects Versions: 0.6
>            Reporter: Roman Shaposhnik
>
> Mahout should follow the Pig and Hive example and not rely explicitly on 
> HADOOP_HOME and HADOOP_CONF_DIR

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to