As an experiment, I implemented a copy-on-write style of concurrent modification
exception prevention in UV3.

It does minimal copying, only copying part of the index related to the
particular type being updated; if no iterators are in use, there's no copying
(but see below).

The copy is done just once, even for multiple iterators, unless a subsequent
iterator is created after another update has happened to that part of the index.

With this, you get a trade-off: no more concurrent modification exceptions; you
can modify indexes within loops, but (incrementally) copies are made of index
parts if needed.  So it takes more space and time, due to copies sometimes being
made.

In the following case, no copies will be made:

  a) modify the indexes

  b) create an iterator, iterate, then drop references to the iterator, and have
the garbage collector gc it.

  c) repeat a and b as much as you like.

If you're through with an iterator, but it hasn't been GC'd yet, then the
modification code can't tell your through with the iterator, and has to make a 
copy.

Is this a good trade off to make?  Should we have 2 modes of running pipelines -
with/without this feature?

-Marshall

P.S. there's an edge case caught by the test cases.  In today's world, if you 
do:
   a) modify the indexes
   b) start iterating
   c) modify the indexes
   d) do one of moveToFirst, Last, or just moveTo(fs), these "reset" the
concurrent mod, and allow continuing use of the iterator, this time over the
updated indexes.  I had to add some more details in the impl to make this work
the same way... 

On 9/14/2016 10:11 AM, Marshall Schor wrote:
> Version 2 had snapshot iterators, used for two purposes:
>
> a) allowing underlying index modifications while iterating (over the 
> snapshot).
> Note that this includes even simple things like changing begin/end values in 
> an
> annotation (which could cause a remove/add-back to indexes action while those
> features are changed).
>
> b) performance (in some edge cases, but also has a performance cost initially
> (to create the snapshot))
>
> It might be reasonable to support case (a) more automatically.  One approach
> might be to do a "copy on write" style for the index parts.  Java has, for
> instance CopyOnWriteArrayList and CopyOnWriteArraySet.  This could add 1 more
> level of indirection in using UIMA indexes; details need to be worked out and
> could be complex (indexes need to be performant and thread-safe for reading).
>
> Does this seem like a good thing to try?
>
> -Marshall
>
>

Reply via email to