Rationalize hadoop job creation with respect to setJarByClass
-------------------------------------------------------------

                 Key: MAHOUT-663
                 URL: https://issues.apache.org/jira/browse/MAHOUT-663
             Project: Mahout
          Issue Type: Bug
          Components: build
    Affects Versions: 0.4
            Reporter: Benson Margulies


Mahout includes a series of driver classes that create hadoop jobs via static 
methods.

Each one of these calls job.setJarByClass(itself.class).

Unfortunately, this subverts the hadoop support for putting additional jars in 
the lib directory of a job jar, since the class passed in is not a class that 
lives in the ordinary section of the job jar.

The effect of this is to force users of Mahout (and Mahout's own example job 
jar) to unpack the mahout-core jar into the main section, instead of just 
treating it as a 'lib' dependency.

It seems to me that all the static job creators should be refactored into a 
public function that returns a job object (and does NOT call 
waitForCompletion), and then the existing wrapper. Users could call the new 
functions, and make their own call to setJarByClass.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to