[
https://issues.apache.org/jira/browse/MAHOUT-3?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Eastman updated MAHOUT-3:
------------------------------
Attachment: MAHOUT-3d.diff
This patch adds "payloads" to the previous patch, by passing the ClusterMapper
input Writable intact through to the Canopy emit method so that any additional
information beyond the point definition propagates through to the output. It is
actually a bit more efficient to do it this way, since the point does not need
to be reformatted upon collection. I've also added two unit tests thereof.
I also added a space after the comma in the point formatting routines to make
the output more human-readable.
I've run this in a larger M/R job producing 50+ clusters from thousands of
points having 25+ dimensions and it seems to be ready for broader use.
> Build initial canopy clustering prototype
> -----------------------------------------
>
> Key: MAHOUT-3
> URL: https://issues.apache.org/jira/browse/MAHOUT-3
> Project: Mahout
> Issue Type: New Feature
> Reporter: Jeff Eastman
> Attachments: MAHOUT-3.diff, MAHOUT-3a.diff, MAHOUT-3b.diff,
> MAHOUT-3c.diff, MAHOUT-3d.diff
>
>
> I'd like to reserve some namespace, specifically
> org.apache.mahout.clustering.canopy to use for an initial prototype of canopy
> clustering. I'm going to start with a little unit test to get the basic
> algorithm sorted out, then M/R it.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.