That runs the risk of forcing a replication of the canopy structure.

These techniques: http://asterix.ics.uci.edu/fuzzyjoin-mapreduce/

might be more useful.

On Mon, Aug 16, 2010 at 7:12 AM, Robin Anil <[email protected]> wrote:

> >
> >
> > In this case, the document needs to be paired with the nearest cluster
> > right, something like Canopy clustering should give partial connection
> > graph
> >
>
> Just populate similarity values for documents in a canopy, very sparse but
> still connected graph due to the overlapping nature of canopy clustering
>
> >
> > Robin
> >
> >
> >
> >
> >>  On Mon, Aug 16, 2010 at 7:00 AM, Robin Anil <[email protected]>
> >> wrote:
> >>
> >> > From a GSOC angle, it needn't be done, its upto your mentor to decide.
> I
> >> am
> >> > interested more in getting this completed and pushed out so that
> people
> >> can
> >> > really use it. If you can spare time after GSOC and still hang around
> >> the
> >> > community and help in getting this polished, it will be great.
> >> >
> >> > To create your pairwise similarity(0-1  1 means dissimilar) matrix(it
> >> can
> >> > be
> >> > the other way around as well), see the DistanceMeasure
> implementations.
> >> > Creating the pairwise matrix is non trivial from a scalability stand
> >> point.
> >> >
> >> > A complete spectral clustering package should take an input set of
> >> > documents, create the matrix and run clustering and output the
> clusters.
> >> To
> >> > get an idea of your work till now, what are the blocks missing from
> this
> >> > ideal package scenario?
> >> >
> >> >
> >> > Robin
> >> >
> >>
> >
> >
>

Reply via email to