One other benefit: UIMA automatically may "under-the-covers" remove and add back
some FSs if you update some features used as keys in indexes.  This could cause
ConcurrentModificationException if you had loops that did this, even though you
had no index operations coded explicitly as part of the loop.

-Marshall Schor


On 9/16/2016 3:59 PM, Marshall Schor wrote:
> As an experiment, I implemented a copy-on-write style of concurrent 
> modification
> exception prevention in UV3.
>
> It does minimal copying, only copying part of the index related to the
> particular type being updated; if no iterators are in use, there's no copying
> (but see below).
>
> The copy is done just once, even for multiple iterators, unless a subsequent
> iterator is created after another update has happened to that part of the 
> index.
>
> With this, you get a trade-off: no more concurrent modification exceptions; 
> you
> can modify indexes within loops, but (incrementally) copies are made of index
> parts if needed.  So it takes more space and time, due to copies sometimes 
> being
> made.
>
> In the following case, no copies will be made:
>
>   a) modify the indexes
>
>   b) create an iterator, iterate, then drop references to the iterator, and 
> have
> the garbage collector gc it.
>
>   c) repeat a and b as much as you like.
>
> If you're through with an iterator, but it hasn't been GC'd yet, then the
> modification code can't tell your through with the iterator, and has to make 
> a copy.
>
> Is this a good trade off to make?  Should we have 2 modes of running 
> pipelines -
> with/without this feature?
>
> -Marshall
>
> P.S. there's an edge case caught by the test cases.  In today's world, if you 
> do:
>    a) modify the indexes
>    b) start iterating
>    c) modify the indexes
>    d) do one of moveToFirst, Last, or just moveTo(fs), these "reset" the
> concurrent mod, and allow continuing use of the iterator, this time over the
> updated indexes.  I had to add some more details in the impl to make this work
> the same way... 
>
> On 9/14/2016 10:11 AM, Marshall Schor wrote:
>> Version 2 had snapshot iterators, used for two purposes:
>>
>> a) allowing underlying index modifications while iterating (over the 
>> snapshot).
>> Note that this includes even simple things like changing begin/end values in 
>> an
>> annotation (which could cause a remove/add-back to indexes action while those
>> features are changed).
>>
>> b) performance (in some edge cases, but also has a performance cost initially
>> (to create the snapshot))
>>
>> It might be reasonable to support case (a) more automatically.  One approach
>> might be to do a "copy on write" style for the index parts.  Java has, for
>> instance CopyOnWriteArrayList and CopyOnWriteArraySet.  This could add 1 more
>> level of indirection in using UIMA indexes; details need to be worked out and
>> could be complex (indexes need to be performant and thread-safe for reading).
>>
>> Does this seem like a good thing to try?
>>
>> -Marshall
>>
>>
>

Reply via email to