Re: Canopy Clustering not scaling

Jeff Eastman Sun, 02 May 2010 08:24:57 -0700

You could try using more, smaller input splits, but large datasets andtoo-small distance thresholds will choke up the mappers with number ofcanopies approaching the number of points seen by the mapper. Also thesingle reducer will choke unless the thresholds allow condensing themapper canopies. I think the OME is just another (quicker) indicationthat your thresholds are wrong; getting several million clusters out ofcanopy is probably not very useful anyway.


On 5/2/10 4:14 AM, Robin Anil wrote:

Keeping all canopies in memory is not making things scale. I frequently run
into out of memory errors when the distance thresholds are not good on
reuters. Any ideas on optimizing this?

Robin

Re: Canopy Clustering not scaling

Reply via email to