[
https://issues.apache.org/jira/browse/MAHOUT-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918593#action_12918593
]
Jeff Eastman commented on MAHOUT-518:
-------------------------------------
Sort of. The density estimators both use a set of representative points taken
from the clustered points output after clustering. But using a threshold to
force large distance measures to product zero affinities - instead of just
small affinities - would make the A matrix sparse again and allow subsequent
processing to scale better. Even with such a threshold; however, the need to
compare each point with every other would make it tricky to do at scale. I can
imagine some sort of mapper-side join in a custom InputFormat would be required
as in DRM.
> Implement Affinity Preprocessing for Eigencuts and Spectral KMeans
> ------------------------------------------------------------------
>
> Key: MAHOUT-518
> URL: https://issues.apache.org/jira/browse/MAHOUT-518
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.4
> Reporter: Jeff Eastman
> Fix For: 0.5
>
>
> The input format for these clustering algorithms is currently affinity
> tuples. It would be very nice to have this process automated. Marking for 0.5
> as this will require some investigation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.