Rationalize hadoop job creation with respect to setJarByClass
-------------------------------------------------------------
Key: MAHOUT-663
URL: https://issues.apache.org/jira/browse/MAHOUT-663
Project: Mahout
Issue Type: Bug
Components: build
Affects Versions: 0.4
Reporter: Benson Margulies
Mahout includes a series of driver classes that create hadoop jobs via static
methods.
Each one of these calls job.setJarByClass(itself.class).
Unfortunately, this subverts the hadoop support for putting additional jars in
the lib directory of a job jar, since the class passed in is not a class that
lives in the ordinary section of the job jar.
The effect of this is to force users of Mahout (and Mahout's own example job
jar) to unpack the mahout-core jar into the main section, instead of just
treating it as a 'lib' dependency.
It seems to me that all the static job creators should be refactored into a
public function that returns a job object (and does NOT call
waitForCompletion), and then the existing wrapper. Users could call the new
functions, and make their own call to setJarByClass.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira