Hi!

Ted, the old fast projection search was failing in the StreamingKMeans
test so I rewrote and simplified it.
It now works with StreamingKMeans and passes its own
FastProjectionSearch tests - most of the time.

The problem is testEpsilon which compares the distances obtained by
doing a BruteSearch and a FastProjection search on a set of random
vectors (sampled from LumpyDaya).

This test sometimes fails and sometimes succeeds. The relevant
assertion is the one checking whether bigRatio < 2 [1] (line 115).
This represents the number of vectors for which the difference between
FPS and BS are over 1.4, so the test will pass as long as there is at
most 1.

However there are somtimes 2 or 3 of these and it fails (sometimes
there are 0 or 1 too).
Also, if I comment this assert out, the averageOverlap assertion passes.

This entire test looks fishy to me. Where are all the numbers coming from? :)

[1] 
https://github.com/dfilimon/knn/blob/master/src/test/java/org/apache/mahout/knn/search/FastProjectionSearchTest.java

Reply via email to