[
https://issues.apache.org/jira/browse/MAHOUT-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13466500#comment-13466500
]
Ted Dunning commented on MAHOUT-1086:
-------------------------------------
OK. I think I have the problem isolated even if I don't understand it. In
getDistanceSquared, I separate out the computation of one operand's squared
length and push it back into that operand for caching. The code is
{code}
Iterator<Element> it = sparseAccessed.iterateNonZero();
double d = randomlyAccessed.getLengthSquared();
double d2 = 0;
double dot = 0;
while (it.hasNext()) {
Element e = it.next();
double value = e.get();
d2 += value * value;
dot += value * randomlyAccessed.getQuick(e.index());
}
//assert d > -1.0e-9; // round-off errors should never be too far off!
final double r1 = Math.abs(d + d2 - 2 * dot);
final double r2 = oldDistanceSquared(v);
final double error = Math.abs(r1 - r2) / r1;
if (error > 1e-14) {
System.err.printf("Discrepancy %.3f\n", error);
}
// if (sparseAccessed instanceof LengthCachingVector) {
// ((LengthCachingVector) sparseAccessed).setLengthSquared(d2);
// }
return r2;
{code}
The commented code is where the cache is updated. If these lines are
commented, the problem does not happen. If these lines are uncommented, it
does happen.
My problem here is that I can't yet understand what the problem is. I also
don't understand how this is different from what we had before. I have also
have put a test into the place that the cache is updated and don't see that
saving this causes a problem.
I think that we have a problem where some other code somewhere is misusing this
cache. I am going to start a wide-ranging inspection to see what is going on.
That is going to take quite a while, especially since I am unlikely to have
another full day to beat on this for a while.
> Mean Shift Test Now Produces 4 Clusters
> ---------------------------------------
>
> Key: MAHOUT-1086
> URL: https://issues.apache.org/jira/browse/MAHOUT-1086
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.7
> Reporter: Jeff Eastman
>
> Something changed in Mahout around 9/6/12 that caused
> TestMeanShift.testCanopyEuclideanMRJobNoClustering to return 4 clusters
> rather than 3. All of the other tests using the same data still return 3
> clusters. No changes were made to any of the MeanShiftCanopy classes other
> than 1 formatting change to the driver so I'm at a loss to the cause.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira