Since we last talked about this, I've made a bunch of JIRA issues. The main one is [1] and it links to all the other ones.
The plan right now is having (at least) 4 patches: 1. minor changes to Centroid and WeightedVector 2. the searcher code 3. the ball k-means and streaming k-means code (non map-reduce) 4. the mapreduce job, mapper and reducer and command line tools Regarding (4), I depend on mrunit [2], snapshot 1.0 (which is currently unreleased). This is for testing the MapReduce driver. mrunit itself depends on mockito (which needs to be an explicit dependency). I've seen that we use easymock. Is this a problem? [1] https://issues.apache.org/jira/browse/MAHOUT-1154 [2] http://mrunit.apache.org/ On Mon, Mar 4, 2013 at 2:18 PM, Ted Dunning <[email protected]> wrote: > I think it might be worth committing in steps. > > The standalone clustering and utility code has almost no impact on existing > Mahout code (what small impacts there were on Vector and friends were > committed some time ago). These can be committed sooner. > > Integration with the map-reduce and command line stuff might take a bit > longer to review. This can be reviewed and committed separately. I would > particularly like Shannon and Jeff's opinions about how the new clustering > fits into the existing framework. There is talk of a second edition of > Mahout in Action and this new clustering would be a major new capability to > be covered in that so fitting in well is important. > > On Mon, Mar 4, 2013 at 7:02 AM, Grant Ingersoll <[email protected]> wrote: > >> > Where do we go from here? Do I open JIRA issues for the changes? Do I >> > first merge changes to the existing Mahout classes? >> >> I believe there is a JIRA already open for it (if not, open one). A patch >> that can be applied to trunk/master with all tests passing would be best. >> Any patch that more or less shows what is done is also welcome, although >> it is a bit harder to consume.
