Re: [jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

Robin Anil Mon, 16 Aug 2010 07:11:49 -0700

>
>
> Anything but sparse connectivity is a complete non-starter for a scalable
> system.


Right, thats why I don't like the pairwise computation approach.

In this case, the document needs to be paired with the nearest cluster
right, something like Canopy clustering should give partial connection graph
?

Robin




> On Mon, Aug 16, 2010 at 7:00 AM, Robin Anil <[email protected]> wrote:
>
> > From a GSOC angle, it needn't be done, its upto your mentor to decide. I
> am
> > interested more in getting this completed and pushed out so that people
> can
> > really use it. If you can spare time after GSOC and still hang around the
> > community and help in getting this polished, it will be great.
> >
> > To create your pairwise similarity(0-1  1 means dissimilar) matrix(it can
> > be
> > the other way around as well), see the DistanceMeasure implementations.
> > Creating the pairwise matrix is non trivial from a scalability stand
> point.
> >
> > A complete spectral clustering package should take an input set of
> > documents, create the matrix and run clustering and output the clusters.
> To
> > get an idea of your work till now, what are the blocks missing from this
> > ideal package scenario?
> >
> >
> > Robin
> >
>

Re: [jira] Commented: (MAHOUT-363) Proposal for GSoC 2010 (EigenCuts clustering algorithm for Mahout)

Reply via email to