Another option is to make the iterator take a reference to the array as it exists and then during merging always create a new array.
A second option is to just let the iterator get a bit confused (don't like the smell there). On Tue, Mar 26, 2013 at 10:59 PM, Dan Filimon <[email protected]>wrote: > Ted, everyone, > > I added the multiple pass BallKMeans at the end of the reducer and > have uncovered a series of issues with the searchers. > > FastProjectionSearch stores "pending" vectors in a separate list from > the main projections, the idea being that use arrays throughout rather > than a tree multiset to reduce the overhead. > > Here's what's happening: > - after having added a bunch of centroids to the searcher, > - we iterate through each centroid and search for its closest neighbor, > - when calling search(), reindex() is called internally that deals > with pending entries that haven't been merged in some time. When > reindex() mutates the data structures, the iterator we're using to go > through the centroids becomes invalid and a > ConcurrentModificationException is thrown. > > I'm not really sure what the best approach is to fix this. Here are > some options: > - expose reindex() and demand users to call it explicitly (which > completely kills abstraction); > - when instantiating an Iterator, create a helper ImmutableList that > keeps track of the pending values for the duration of the iteration > and when iterating go through these first and release the reference to > the list as soon as possible; > - come up with an alternative design and really think about how this > searcher is useful. > > In particular, when running minimal tests (SearchQualityTest), this > FPS is faster than ordinary ProjectionSearch but worse than LSH. > > Any thoughts? >
