The Job classes in examples are very, very old carryovers from a time
before our Drivers had a CLI. Just follow the current pattern, extending
AbstractJob.
On 2/11/13 3:00 PM, Dan Filimon wrote:
So I'm finally back to work! :)
The first order of business is (I felt) moving the code in the
separate repo [1] to my version of Mahout [2].
The code is organized in the exact same way (and should be simplified).
There are a bunch of code folders under
org.apache.mahout.clustering.streaming (this is where I feel the
streaming k-means classes should go). The folders include 'cluster'
(for the main algorithms), 'experimental' (for the MR classes) and
various 'tools'.
Most of these should be deleted. For now, I want to improve the tests
and benchmark.
Then, integrate into the existing framework.
I'm confused about the Job though. I have a driver [3], but I've seen
Job classes in the examples folder as well.
Where should it go to be accessible through the mahout command line?
More questions coming soon... :)
Also, if anyone wants to run it, you need mrunit-1.0.0-SNAPSHOT, which
is *not* on Central yet, so I hardcoded it to 'core/lib'.
Feel free to review/criticize/open issues/... either here or JIRA or github.
Thanks everyone (especially you, Ted!),
[1] https://github.com/dfilimon/knn
[2] https://github.com/dfilimon/mahout
[3]
https://github.com/dfilimon/mahout/blob/skm/core/src/main/java/org/apache/mahout/clustering/streaming/experimental/StreamingKMeansDriver.java