Hello, i'm running some clustering with the Mean Shift and in my final canopy i get 5x the same vector.
In the original input list i only had it once and i'm wondering why duplicates are allowed within the same canopy? Attached is a file with the method i'm using to run mean shift as well as the ouput (i'm iterating over the getBoundPoints() list of the canopy). I'd be happy if someone could explain this. regards Christoph Hermann -- Christoph Hermann Institut für Informatik Tel: +49 761-203-8171 Fax: +49 761-203-8162 e-mail: [email protected]
public static List<MeanShiftCanopy> runMeanShift(
List<MeanShiftCanopy> canopies, Map<Long, Vector> vectors,
DistanceMeasure aMeasure, double aT1, double aT2, double aDelta) {
List<MeanShiftCanopy> canopiesResult = canopies;
MeanShiftCanopy.config(aMeasure, aT1, aT2, aDelta);
// add all points to the canopies
for (Vector aRaw : vectors.values()) {
MeanShiftCanopy.mergeCanopy(new MeanShiftCanopy(aRaw), canopiesResult);
}
boolean done = false;
while (!done) { // shift canopies to their centroids
done = true;
List<MeanShiftCanopy> migratedCanopies = new ArrayList<MeanShiftCanopy>();
for (MeanShiftCanopy canopy : canopiesResult) {
done = canopy.shiftToMean() && done;
MeanShiftCanopy.mergeCanopy(canopy, migratedCanopies);
}
canopiesResult = migratedCanopies;
}
return canopiesResult;
}
Vectors v: {"class":"org.apache.mahout.matrix.DenseVector","vector":"{\"values\":[5.0,10.0,2.0,4.0,2.0,5.0,7.0],\"lengthSquared\":-1.0,\"name\":\"6407\"}"}
List of other Vectors in same Canopy as Vector v: {"class":"org.apache.mahout.matrix.DenseVector","vector":"{\"values\":[5.0,10.0,2.0,4.0,2.0,5.0,7.0],\"lengthSquared\":-1.0,\"name\":\"6407\"}"}
Vector: {"class":"org.apache.mahout.matrix.DenseVector","vector":"{\"values\":[5.0,10.0,2.0,4.0,2.0,5.0,7.0],\"lengthSquared\":-1.0,\"name\":\"6407\"}"}
Vector: {"class":"org.apache.mahout.matrix.DenseVector","vector":"{\"values\":[5.0,10.0,2.0,4.0,2.0,5.0,7.0],\"lengthSquared\":-1.0,\"name\":\"6407\"}"}
Vector: {"class":"org.apache.mahout.matrix.DenseVector","vector":"{\"values\":[5.0,10.0,2.0,4.0,2.0,5.0,7.0],\"lengthSquared\":-1.0,\"name\":\"6407\"}"}
Vector: {"class":"org.apache.mahout.matrix.DenseVector","vector":"{\"values\":[5.0,10.0,2.0,4.0,2.0,5.0,7.0],\"lengthSquared\":-1.0,\"name\":\"6407\"}"}
Vector: {"class":"org.apache.mahout.matrix.DenseVector","vector":"{\"values\":[5.0,10.0,2.0,4.0,2.0,5.0,7.0],\"lengthSquared\":-1.0,\"name\":\"6407\"}"}
We have 5 no of points in the same canopy: 6407, 6407, 6407, 6407, 6407
smime.p7s
Description: S/MIME cryptographic signature
