Hi! Ted, the old fast projection search was failing in the StreamingKMeans test so I rewrote and simplified it. It now works with StreamingKMeans and passes its own FastProjectionSearch tests - most of the time.
The problem is testEpsilon which compares the distances obtained by doing a BruteSearch and a FastProjection search on a set of random vectors (sampled from LumpyDaya). This test sometimes fails and sometimes succeeds. The relevant assertion is the one checking whether bigRatio < 2 [1] (line 115). This represents the number of vectors for which the difference between FPS and BS are over 1.4, so the test will pass as long as there is at most 1. However there are somtimes 2 or 3 of these and it fails (sometimes there are 0 or 1 too). Also, if I comment this assert out, the averageOverlap assertion passes. This entire test looks fishy to me. Where are all the numbers coming from? :) [1] https://github.com/dfilimon/knn/blob/master/src/test/java/org/apache/mahout/knn/search/FastProjectionSearchTest.java
