Re: Clustering Items

Ted Dunning Mon, 11 Jan 2010 09:03:40 -0800

k-means is more used.  You might consider running k-means several times.

There is also the Dirichlet process clustering which is a bit tweakier than
k-means, but it can infer the number of clusters for you.


On Mon, Jan 11, 2010 at 8:30 AM, Christoph Hermann <
[email protected]> wrote:

> Hello,
>
> i have never used Mahout before and before i invest too much time
> reading api and source code i thought that i maybe get some pointers
> from you.
>
> I have several Objects containing 1..n attributes (actually long/double
> values). I want to cluster these Objects to get Clusters of similar
> Objects regardings those n attributes.
> Then i want to be able to look up in which cluster my object is and
> which other objects also belong to this cluster.
>
> I thought that such a clustering would be possible using the Mean Shift
> from Mahout (since i don't know how many clusters i will have in
> advance, else i would probably use k-means).
>
> So what i have to do is transform these Objects to VectorS and then
> cluster them using MeanShiftCanopy and some distance measure (probably
> EuclideanDistanceMeasure at the beginning).
>
> foo = new DenseVector(new double[]{ val1, ..., valn});
> and then basically follow what is done in testReferenceImplementation()
> of the DisplayMeanShift class (My entry point is the DisplayMeanShift
> class so far.).
>
> Is that correct? Is there any other example doing something similar i
> could look at?
>
> Any additional pointers are welcome - i already read the IBM article
> from Grant Ingersoll.
>
> regards
> Christoph Hermann
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Clustering Items

Reply via email to