Another option is to make the iterator take a reference to the array as it
exists and then during merging always create a new array.

A second option is to just let the iterator get a bit confused (don't like
the smell there).

On Tue, Mar 26, 2013 at 10:59 PM, Dan Filimon
<[email protected]>wrote:

> Ted, everyone,
>
> I added the multiple pass BallKMeans at the end of the reducer and
> have uncovered a series of issues with the searchers.
>
> FastProjectionSearch stores "pending" vectors in a separate list from
> the main projections, the idea being that use arrays throughout rather
> than a tree multiset to reduce the overhead.
>
> Here's what's happening:
> - after having added a bunch of centroids to the searcher,
> - we iterate through each centroid and search for its closest neighbor,
> - when calling search(), reindex() is called internally that deals
> with pending entries that haven't been merged in some time. When
> reindex() mutates the data structures, the iterator we're using to go
> through the centroids becomes invalid and a
> ConcurrentModificationException is thrown.
>
> I'm not really sure what the best approach is to fix this. Here are
> some options:
> - expose reindex() and demand users to call it explicitly (which
> completely kills abstraction);
> - when instantiating an Iterator, create a helper ImmutableList that
> keeps track of the pending values for the duration of the iteration
> and when iterating go through these first and release the reference to
> the list as soon as possible;
> - come up with an alternative design and really think about how this
> searcher is useful.
>
> In particular, when running minimal tests (SearchQualityTest), this
> FPS is faster than ordinary ProjectionSearch but worse than LSH.
>
> Any thoughts?
>

Reply via email to