[ https://issues.apache.org/jira/browse/WHIRR-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103965#comment-13103965 ]
Frank Scholten commented on WHIRR-384: -------------------------------------- Added new patch with 'mahout-client' role and without the unneeded dependencies. At the moment the 'mahout-client' role is oblivious to Hadoop. It unpacks the tarball and adds the mahout script to the path. The mahout script does have some checks, it looks for configuration in $HADOOP_HOME/conf but you still need to setup a Hadoop cluster. Before this patch I would point HADOOP_CONF_DIR to the Hadoop configuration generated by Whirr on my local machine and run jobs from there. I guess if Whirr could generate this config on another node under $HADOOP_HOME/conf and you give this node the 'mahout-client' you can submit mahout jobs from that node in the same way. The role does not have to be added to a namenode, the node just needs Hadoop configuration. About the 'mahout-jar' role, my idea was to create a cluster with the Mahout jar on tasktracker nodes so you could run a Mahout job from a Java process that has compile dependencies on Mahout without having to build a job jar that contains Mahout and its dependencies. I would like to be able to set up a Java project with dependencies on Whirr, Mahout and Hadoop and launch jobs from Java without building a job jar. However, if you this is problematic or not a good idea let me know. > Add Mahout as a service > ----------------------- > > Key: WHIRR-384 > URL: https://issues.apache.org/jira/browse/WHIRR-384 > Project: Whirr > Issue Type: New Feature > Components: new service > Affects Versions: 0.7.0 > Reporter: Frank Scholten > Fix For: 0.7.0 > > Attachments: WHIRR-384-mahout-client.patch, > WHIRR-384-mahout-home.patch > > > Here is an initial patch to support Mahout as a Whirr service. > I created the role 'mahout-home' which can be used to install the binary > Mahout distribution on a Hadoop namenode. > By combining this role with configuration for a Hadoop cluster you can SSH > into the namenode, su to root and start running Mahout jobs via the mahout > script immediately. > The 'mahout-home' role has two properties > Mahout version whirr.mahout.version > URL of the Mahout binary distribution tarball whirr.mahout.tarball.url > Note that I used a snapshot version of Mahout for testing, revision 1169784, > because there were some problems with the Mahout script in 0.5 that have been > fixed on trunk, see MAHOUT-680. To test you can set the tarball property to > this link > http://dl.dropbox.com/u/13436484/mahout-distribution-0.6-SNAPSHOT.tar.gz > I used configure actions and the onBeforeConfigure(). If there is a better > way to express this with the Whirr API let me know. > Currently I am investigating a 'mahout-jar' role, which installs the Mahout > examples job jar under $HADOOP_HOME/lib on a tasktracer node. I already have > some code for putting the jar in place but when running a job from my local > machine I still get ClassNotFoundExceptions. I believe this is because Hadoop > has already started before the jar is put in the lib dir, so the jar won't be > picked up, but I have to investigate some more. From WHIRR-221 I understood > that there is no support (yet?) for ordering of services but if you have an > idea on how to fix this let me know. > Comments and suggestions welcome! -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira