I realized that in fact just the pending array is not enough. There is
also the distances array that needs fixing.
In fact, an entire new copy is needed if we're to be able to safely
iterate and reindex.

I'm shelving this for now. That ok?

On Wed, Mar 27, 2013 at 4:35 AM, Ted Dunning <[email protected]> wrote:
> Another option is to make the iterator take a reference to the array as it
> exists and then during merging always create a new array.
>
> A second option is to just let the iterator get a bit confused (don't like
> the smell there).
>
> On Tue, Mar 26, 2013 at 10:59 PM, Dan Filimon
> <[email protected]>wrote:
>
>> Ted, everyone,
>>
>> I added the multiple pass BallKMeans at the end of the reducer and
>> have uncovered a series of issues with the searchers.
>>
>> FastProjectionSearch stores "pending" vectors in a separate list from
>> the main projections, the idea being that use arrays throughout rather
>> than a tree multiset to reduce the overhead.
>>
>> Here's what's happening:
>> - after having added a bunch of centroids to the searcher,
>> - we iterate through each centroid and search for its closest neighbor,
>> - when calling search(), reindex() is called internally that deals
>> with pending entries that haven't been merged in some time. When
>> reindex() mutates the data structures, the iterator we're using to go
>> through the centroids becomes invalid and a
>> ConcurrentModificationException is thrown.
>>
>> I'm not really sure what the best approach is to fix this. Here are
>> some options:
>> - expose reindex() and demand users to call it explicitly (which
>> completely kills abstraction);
>> - when instantiating an Iterator, create a helper ImmutableList that
>> keeps track of the pending values for the duration of the iteration
>> and when iterating go through these first and release the reference to
>> the list as soon as possible;
>> - come up with an alternative design and really think about how this
>> searcher is useful.
>>
>> In particular, when running minimal tests (SearchQualityTest), this
>> FPS is faster than ordinary ProjectionSearch but worse than LSH.
>>
>> Any thoughts?
>>

Reply via email to