[
https://issues.apache.org/jira/browse/MAHOUT-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918578#action_12918578
]
Shannon Quinn commented on MAHOUT-518:
--------------------------------------
Some of the examples I've seen of generalized spectral clustering use points in
two-dimensional space and generate affinities between them. In theory there's
no issue with this; the only problem is you can easily imagine situations where
the data are non-symmetric (i.e. the KNN of one point has a member which does
not contain the original point in its KNN), so yes the only way to guarantee
symmetry is to compute the affinity of each point with every other point, and
that clearly isn't scalable. A distance threshold would work much better -
something more along the lines of density estimation?
> Implement Affinity Preprocessing for Eigencuts and Spectral KMeans
> ------------------------------------------------------------------
>
> Key: MAHOUT-518
> URL: https://issues.apache.org/jira/browse/MAHOUT-518
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.4
> Reporter: Jeff Eastman
> Fix For: 0.5
>
>
> The input format for these clustering algorithms is currently affinity
> tuples. It would be very nice to have this process automated. Marking for 0.5
> as this will require some investigation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.