[
https://issues.apache.org/jira/browse/MAHOUT-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918273#action_12918273
]
Jeff Eastman commented on MAHOUT-518:
-------------------------------------
Mean Shift is used in image processing too, but has shown itself to work pretty
well on other vector clustering applications. I wonder if spectral clustering
can be also? I see that the affinity preprocessing of n, arbitrary vectors
might produce a dense nxn matrix if we used a DistanceMeasure as the affinity
measure and this would clearly not scale. There are also scalability problems
with needing to compare each point with every other as in Mean Shift. But the
addition of a distance threshold, similar to T1 & T2 for Canopy could allow a
distance-measure-based preprocessor to produce affinity matrices that were both
square and sparse. It might just be GIGO, but it would allow the spectral
clustering jobs to consume arbitrary vectors like most of the other clustering
jobs.
> Implement Affinity Preprocessing for Eigencuts and Spectral KMeans
> ------------------------------------------------------------------
>
> Key: MAHOUT-518
> URL: https://issues.apache.org/jira/browse/MAHOUT-518
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.4
> Reporter: Jeff Eastman
> Fix For: 0.5
>
>
> The input format for these clustering algorithms is currently affinity
> tuples. It would be very nice to have this process automated. Marking for 0.5
> as this will require some investigation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.