[
https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shannon Quinn updated MAHOUT-524:
---------------------------------
Attachment: raw.txt
aff.txt
No, there's definitely something wrong here. I've attached some synthetic data
I generated - concentric circles of 2D points, which spectral clustering is
particularly good at correctly grouping. The "raw" file contains 450 raw data
points in 3 separate circles (feel free to plot them to take a look). The
affinities are generated by providing a cutoff in terms of Euclidean distance -
say, 2.0 - where any point that has a distance of < 2 is given an affinity
(here using the Gaussian kernel, gives a nice [0, 1] affinity), and everything
else is set to 0 (enforces sparsity in the affinity matrix). Plus, I
constructed the data specifically such that the points between circles have a
minimum distance of 2.
Unfortunately, if you run SpectralKMeans on the aff.txt file, other than the
tightly-packed cluster in the middle it doesn't do a particularly good job of
identifying the other two clusters (points 0-149 should have the same ID, as
well as points 150-299, and 300-449). Obviously there is still something amiss;
a good place to start is to take a look at the eigenvectors generated by the
LanczosSolver. If everything is behaving as it should, these should show
piecewise constancy: that is, each 150 consecutive elements in the 450-element
vectors should have about the same value. If this is not the case, we're either
generating affinities incorrectly, or there's a problem with the algorithm
itself.
Also, I noticed when attempting to run the program that it often doesn't show
the entire list of available and required arguments. I couldn't reliably
determine a cause; often it would show just 1 of the required arguments, but if
I supplied some of the required arguments and left out others, it would display
all of them. I'm assuming this is a bug; any idea where I could find it?
> DisplaySpectralKMeans example fails
> -----------------------------------
>
> Key: MAHOUT-524
> URL: https://issues.apache.org/jira/browse/MAHOUT-524
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.4
> Reporter: Jeff Eastman
> Fix For: 0.5
>
> Attachments: aff.txt, raw.txt
>
>
> I've committed a new display example that attempts to push the standard
> mixture of models data set through spectral k-means. After some tweaking of
> configuration arguments and a bug fix in EigenCleanupJob it runs spectral
> k-means to completion. The display example is expecting 2-d clustered points
> and the example is producing 5-d points. Additional I/O work is needed before
> this will play with the rest of the clustering algorithms.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.