[ 
https://issues.apache.org/jira/browse/MAHOUT-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Eastman updated MAHOUT-3:
------------------------------

    Attachment: MAHOUT-3d.diff

This patch adds "payloads" to the previous patch, by passing the ClusterMapper 
input Writable intact through to the Canopy emit method so that any additional 
information beyond the point definition propagates through to the output. It is 
actually a bit more efficient to do it this way, since the point does not need 
to be reformatted upon collection. I've also added two unit tests thereof. 

I also added a space after the comma in the point formatting routines to make 
the output more human-readable.

I've run this in a larger M/R job producing 50+ clusters from thousands of 
points having 25+ dimensions and it seems to be ready for broader use.

> Build initial canopy clustering prototype
> -----------------------------------------
>
>                 Key: MAHOUT-3
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-3
>             Project: Mahout
>          Issue Type: New Feature
>            Reporter: Jeff Eastman
>         Attachments: MAHOUT-3.diff, MAHOUT-3a.diff, MAHOUT-3b.diff, 
> MAHOUT-3c.diff, MAHOUT-3d.diff
>
>
> I'd like to reserve some namespace, specifically 
> org.apache.mahout.clustering.canopy to use for an initial prototype of canopy 
> clustering. I'm going to start with a little unit test to get the basic 
> algorithm sorted out, then M/R it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to