CanopyDriver : run : clusterFilter : bug

Paritosh Ranjan Sun, 02 Oct 2011 11:36:50 -0700

The new parameter, clusterFilter, in CanopyDriver's run method, is notworking properly.

This is because, in ClusterMapper's findClosestCanopy method, the ifcondition


protected Canopy findClosestCanopy(Vector point, Iterable<Canopy>  canopies) {
    ...
    // find closest canopy
    for (Canopy canopy : canopies) {

      double dist = measure.distance(canopy.getCenter().getLengthSquared(), 
canopy.getCenter(), point);

      if (*dist<  minDist*) {

        ...

}}



should be replaced with,

if (*dist < minDist && dist <= t1 *)

Otherwise, all records get the same canopy.

This fix also needs some null pointer checks. I have fixed it, and gotit working. I will try to provide the patch with a test case whichreproduces the issue.


Thanks and Regards,
Paritosh Ranjan

On 02-10-2011 14:06, Paritosh Ranjan wrote:

Even run() of CanopyDriver, which takes only T1 and T2 is givingdifferent results for sequential and mapreduce.This is preventing me from scaling up, as I need to run mapreduce onhadoop to scale.
Is anyone having any idea of this problem?

On 02-10-2011 00:27, Paritosh Ranjan wrote:
Hi,

I am able to cluster correctly sequentially, using CanopyDriver.
However, the same dataset, when processed as a MapReduce job, where (t1 = t3 and t2 = t4 and t1>t2) is not working. I am getting errorslike Canopies are empty.
I also tried to reduce the values of t3 and t4. But reducing iteither has no effect or gives meaningless results.
Am I doing something wrong? or is there a bug somewhere?
I feel that both, sequential and MapReduce should give similarresults. But, It is not happening.
Thanks and Regards,
Paritosh


-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1410 / Virus Database: 1520/3932 - Release Date: 10/01/11

CanopyDriver : run : clusterFilter : bug

Reply via email to