We have a couple JIRAs that relate here: We want to factor all the (-cl)
classification steps out of all of the driver classes (MAHOUT-930) and
into a separate job to remove duplicated code; MAHOUT-931 is to add a
pluggable outlier removal capability to this job; and MAHOUT-933 is
aimed at factoring all the iteration mechanics from each driver class
into the ClusterIterator, which uses a ClusterClassifier which is itself
an OnlineLearner. This will hopefully allow semi-supervised classifier
applications to be constructed by feeding cluster-derived models into
the classification process. Still kind of fuzzy at this point but
promising too.
On 2/11/12 2:29 PM, Frank Scholten wrote:
...
What kind of clustering refactoring do mean here? I did some work on
creating bean configurations in the past (MAHOUT-612). I
underestimated the amount of work required to do the entire
refactoring. If this can be contributed and committed on a per-job
basis I would like to help out.
...