[ 
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shannon Quinn updated MAHOUT-524:
---------------------------------

    Attachment: raw.txt
                aff.txt

No, there's definitely something wrong here. I've attached some synthetic data 
I generated - concentric circles of 2D points, which spectral clustering is 
particularly good at correctly grouping. The "raw" file contains 450 raw data 
points in 3 separate circles (feel free to plot them to take a look). The 
affinities are generated by providing a cutoff in terms of Euclidean distance - 
say, 2.0 - where any point that has a distance of < 2 is given an affinity 
(here using the Gaussian kernel, gives a nice [0, 1] affinity), and everything 
else is set to 0 (enforces sparsity in the affinity matrix). Plus, I 
constructed the data specifically such that the points between circles have a 
minimum distance of 2.

Unfortunately, if you run SpectralKMeans on the aff.txt file, other than the 
tightly-packed cluster in the middle it doesn't do a particularly good job of 
identifying the other two clusters (points 0-149 should have the same ID, as 
well as points 150-299, and 300-449). Obviously there is still something amiss; 
a good place to start is to take a look at the eigenvectors generated by the 
LanczosSolver. If everything is behaving as it should, these should show 
piecewise constancy: that is, each 150 consecutive elements in the 450-element 
vectors should have about the same value. If this is not the case, we're either 
generating affinities incorrectly, or there's a problem with the algorithm 
itself.

Also, I noticed when attempting to run the program that it often doesn't show 
the entire list of available and required arguments. I couldn't reliably 
determine a cause; often it would show just 1 of the required arguments, but if 
I supplied some of the required arguments and left out others, it would display 
all of them. I'm assuming this is a bug; any idea where I could find it?

> DisplaySpectralKMeans example fails
> -----------------------------------
>
>                 Key: MAHOUT-524
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-524
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.4
>            Reporter: Jeff Eastman
>             Fix For: 0.5
>
>         Attachments: aff.txt, raw.txt
>
>
> I've committed a new display example that attempts to push the standard 
> mixture of models data set through spectral k-means. After some tweaking of 
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral 
> k-means to completion. The display example is expecting 2-d clustered points 
> and the example is producing 5-d points. Additional I/O work is needed before 
> this will play with the rest of the clustering algorithms. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to