k-means is more used. You might consider running k-means several times. There is also the Dirichlet process clustering which is a bit tweakier than k-means, but it can infer the number of clusters for you.
On Mon, Jan 11, 2010 at 8:30 AM, Christoph Hermann < [email protected]> wrote: > Hello, > > i have never used Mahout before and before i invest too much time > reading api and source code i thought that i maybe get some pointers > from you. > > I have several Objects containing 1..n attributes (actually long/double > values). I want to cluster these Objects to get Clusters of similar > Objects regardings those n attributes. > Then i want to be able to look up in which cluster my object is and > which other objects also belong to this cluster. > > I thought that such a clustering would be possible using the Mean Shift > from Mahout (since i don't know how many clusters i will have in > advance, else i would probably use k-means). > > So what i have to do is transform these Objects to VectorS and then > cluster them using MeanShiftCanopy and some distance measure (probably > EuclideanDistanceMeasure at the beginning). > > foo = new DenseVector(new double[]{ val1, ..., valn}); > and then basically follow what is done in testReferenceImplementation() > of the DisplayMeanShift class (My entry point is the DisplayMeanShift > class so far.). > > Is that correct? Is there any other example doing something similar i > could look at? > > Any additional pointers are welcome - i already read the IBM article > from Grant Ingersoll. > > regards > Christoph Hermann > -- Ted Dunning, CTO DeepDyve
