[jira] [Commented] (WHIRR-384) Add Mahout as a service

Frank Scholten (JIRA) Tue, 13 Sep 2011 13:45:35 -0700

    [ 
https://issues.apache.org/jira/browse/WHIRR-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103965#comment-13103965
 ]


Frank Scholten commented on WHIRR-384:
--------------------------------------

Added new patch with 'mahout-client' role and without the unneeded dependencies.

At the moment the 'mahout-client' role is oblivious to Hadoop. It unpacks the 
tarball and adds the mahout script to the path. The mahout script does have 
some checks, it looks for configuration in $HADOOP_HOME/conf but you still need 
to setup a Hadoop cluster.

Before this patch I would point HADOOP_CONF_DIR to the Hadoop configuration 
generated by Whirr on my local machine and run jobs from there. I guess if 
Whirr could generate this config on another node under $HADOOP_HOME/conf and 
you give this node the 'mahout-client' you can submit mahout jobs from that 
node in the same way. The role does not have to be added to a namenode, the 
node just needs Hadoop configuration.

About the 'mahout-jar' role, my idea was to create a cluster with the Mahout 
jar on tasktracker nodes so you could run a Mahout job from a Java process that 
has compile dependencies on Mahout without having to build a job jar that 
contains Mahout and its dependencies. I would like to be able to set up a Java 
project with dependencies on Whirr, Mahout and Hadoop and launch jobs from Java 
without building a job jar. However, if you this is problematic or not a good 
idea let me know. 

> Add Mahout as a service
> -----------------------
>
>                 Key: WHIRR-384
>                 URL: https://issues.apache.org/jira/browse/WHIRR-384
>             Project: Whirr
>          Issue Type: New Feature
>          Components: new service
>    Affects Versions: 0.7.0
>            Reporter: Frank Scholten
>             Fix For: 0.7.0
>
>         Attachments: WHIRR-384-mahout-client.patch, 
> WHIRR-384-mahout-home.patch
>
>
> Here is an initial patch to support Mahout as a Whirr service.
> I created the role 'mahout-home' which can be used to install the binary 
> Mahout distribution on a Hadoop namenode.
> By combining this role with configuration for a Hadoop cluster you can SSH 
> into the namenode, su to root and start running Mahout jobs via the mahout 
> script immediately.
> The 'mahout-home' role has two properties
> Mahout version                                        whirr.mahout.version 
> URL of the Mahout binary distribution tarball whirr.mahout.tarball.url
> Note that I used a snapshot version of Mahout for testing, revision 1169784, 
> because there were some problems with the Mahout script in 0.5 that have been 
> fixed on trunk, see MAHOUT-680. To test you can set the tarball property to 
> this link 
> http://dl.dropbox.com/u/13436484/mahout-distribution-0.6-SNAPSHOT.tar.gz
> I used configure actions and the onBeforeConfigure(). If there is a better 
> way to express this with the Whirr API let me know.
> Currently I am investigating a 'mahout-jar' role, which installs the Mahout 
> examples job jar under $HADOOP_HOME/lib on a tasktracer node. I already have 
> some code for putting the jar in place but when running a job from my local 
> machine I still get ClassNotFoundExceptions. I believe this is because Hadoop 
> has already started before the jar is put in the lib dir, so the jar won't be 
> picked up, but I have to investigate some more. From WHIRR-221 I understood 
> that there is no support (yet?) for ordering of services but if you have an 
> idea on how to fix this let me know.
> Comments and suggestions welcome!

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (WHIRR-384) Add Mahout as a service

Reply via email to