[ 
https://issues.apache.org/jira/browse/MAHOUT-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918273#action_12918273
 ] 

Jeff Eastman commented on MAHOUT-518:
-------------------------------------

Mean Shift is used in image processing too, but has shown itself to work pretty 
well on other vector clustering applications. I wonder if spectral clustering 
can be also? I see that the affinity preprocessing of n, arbitrary vectors 
might produce a dense nxn matrix if we used a DistanceMeasure as the affinity 
measure and this would clearly not scale. There are also scalability problems 
with needing to compare each point with every other as in Mean Shift. But the 
addition of a distance threshold, similar to T1 & T2 for Canopy could allow a 
distance-measure-based preprocessor to produce affinity matrices that were both 
square and sparse. It might just be GIGO, but it would allow the spectral 
clustering jobs to consume arbitrary vectors like most of the other clustering 
jobs. 

> Implement Affinity Preprocessing for Eigencuts and Spectral KMeans
> ------------------------------------------------------------------
>
>                 Key: MAHOUT-518
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-518
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4
>            Reporter: Jeff Eastman
>             Fix For: 0.5
>
>
> The input format for these clustering algorithms is currently affinity 
> tuples. It would be very nice to have this process automated. Marking for 0.5 
> as this will require some investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to