[
https://issues.apache.org/jira/browse/MAHOUT-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Schelter resolved MAHOUT-1506.
----------------------------------------
Resolution: Won't Fix
resolving as won't fix as agreed to not add new mr code. shout if you disagree.
> Creation of affinity matrix for spectral clustering
> ---------------------------------------------------
>
> Key: MAHOUT-1506
> URL: https://issues.apache.org/jira/browse/MAHOUT-1506
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 1.0
> Reporter: Shannon Quinn
> Assignee: Shannon Quinn
> Fix For: 1.0
>
>
> I wanted to get this discussion going, since I think this is a critical
> blocker for any kind of documentation update on spectral clustering (I can't
> update the documentation until the algorithm is useful, and it won't be
> useful until there's a built-in method for converting raw data to an affinity
> matrix).
> Namely, I'm wondering what kind of "raw" data should this algorithm be
> expecting (anything that k-means expects, basically?), and what are the data
> structures associated with this? I've created a proof-of-concept for how
> pairwise affinity generation could work.
> https://github.com/magsol/Hadoop-Affinity
> It's a two-step job, but if the data structures in the input data format
> provides 1) the total number of data points, and 2) for each data point to
> know its index in the overall set, then the first job can be scrapped
> entirely and affinity generation will consist of 1 MR task.
> (discussions on Spark / h20 pending, of course)
> Mainly this is an engineering problem at this point. Let me know your
> thoughts and I'll get this done (I'm out of town the next 10 days for my
> wedding/honeymoon, will get to this on my return).
--
This message was sent by Atlassian JIRA
(v6.2#6252)