[ 
https://issues.apache.org/jira/browse/MAHOUT-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017625#comment-13017625
 ] 

Frank Scholten commented on MAHOUT-663:
---------------------------------------

This is interesting and could be combined with the work on MAHOUT-612. In that 
issue I suggested to create Java beans which run the MapReduce jobs (KMeans, 
Canopy, etc) and have the Drivers only parse command line arguments and map 
them into a configuration object. Your method for setting the class jar could 
be added to the (KMeans|Canopy)MapReduceAlgorithm and be overridden by the 
user. Any thoughts?

> Rationalize hadoop job creation with respect to setJarByClass
> -------------------------------------------------------------
>
>                 Key: MAHOUT-663
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-663
>             Project: Mahout
>          Issue Type: Bug
>          Components: build
>    Affects Versions: 0.4
>            Reporter: Benson Margulies
>
> Mahout includes a series of driver classes that create hadoop jobs via static 
> methods.
> Each one of these calls job.setJarByClass(itself.class).
> Unfortunately, this subverts the hadoop support for putting additional jars 
> in the lib directory of a job jar, since the class passed in is not a class 
> that lives in the ordinary section of the job jar.
> The effect of this is to force users of Mahout (and Mahout's own example job 
> jar) to unpack the mahout-core jar into the main section, instead of just 
> treating it as a 'lib' dependency.
> It seems to me that all the static job creators should be refactored into a 
> public function that returns a job object (and does NOT call 
> waitForCompletion), and then the existing wrapper. Users could call the new 
> functions, and make their own call to setJarByClass.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to