[ 
https://issues.apache.org/jira/browse/MAHOUT-518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12918593#action_12918593
 ] 

Jeff Eastman commented on MAHOUT-518:
-------------------------------------

Sort of. The density estimators both use a set of representative points taken 
from the clustered points output after clustering. But using a threshold to 
force large distance measures to product zero affinities - instead of just 
small affinities - would make the A matrix sparse again and allow subsequent 
processing to scale better. Even with such a threshold; however, the need to 
compare each point with every other would make it tricky to do at scale. I can 
imagine some sort of mapper-side join in a custom InputFormat would be required 
as in DRM.

> Implement Affinity Preprocessing for Eigencuts and Spectral KMeans
> ------------------------------------------------------------------
>
>                 Key: MAHOUT-518
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-518
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.4
>            Reporter: Jeff Eastman
>             Fix For: 0.5
>
>
> The input format for these clustering algorithms is currently affinity 
> tuples. It would be very nice to have this process automated. Marking for 0.5 
> as this will require some investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to