Hi,

Thanks for the answer.

> Hi,
>
> The CAS is not designed for concurrent access, to my knowledge, but
> perhaps others can comment more on this.
>   
I'd like to know more about that, because imho this is a quite strong
limitation: maybe naively, I used to think that using concurrent access
only for reading was safe, since most concurrency problems occur when
threads can also write the shared object?
> Most scale-out use-cases are designs which also scale out the CASes.  We
> would be interested in hearing about a use case which motivates
> multi-threaded access to a single CAS.
>   
Indeed, my use-case probably does not correspond to what UIMA is
intended for. I must explain a bit the context: we are actually building
wrapper annotators for external programs called through a ProcessBuilder
object (yes, the dirty "exec" call). We are aware of the problems that
this implies, and ideally we would have re-coded our tools from scratch
as UIMA annotators or used C++ framework. Nevertheless we decided that
was the best choice, because our team owns a few complex NLP tools which
are the core of our work and would be very costly to migrate; so we want
to provide quite quickly a way to use them in a UIMA environment so that
people start using UIMA when creating higher level components (and maybe
these core components will be migrated later).

In this context, we try to provide an "as safe and efficient as
possible" framework in which these programs are called inside an
annotator. That is why we use threads to provide the input stream and
read the output stream. In order to avoid wasting time and space, our
threads use Reader and Writer objects so that data is transmitted on the
fly to/from the process (inside the process method). Thus concurrent
access to the CAS is required when the Writer object that provides the
stdin stream is still reading annotations, while the Reader object has
already started to re-align the program output with the CAS content. Of
course no concurrency problem occurs if the input/output are transmitted
as simple String objets or as files, but that is clearly less efficient
(and not safer, as far as i know).

I don't know whether there can be more standard use-cases using threads.
Nevertheless the problem would be the same if the black box was not an
external program but any piece of code that can not be modified and
behaves like a pipe.

Erwan


> -Marshall
>
> On 6/15/2010 1:35 PM, Erwan Moreau wrote:
>   
>> Hello,
>>
>> I experience problems using several threads which read annotations in
>> the same (default) CAS index, inside the same call to the process
>> method. Since I'm new to UIMA I'm not sure how to interpret that: normal
>> behaviour due to wrong usage or bug ? The exception stack is:
>>
>> java.lang.IndexOutOfBoundsException: Index: 0, Size: 3
>>        at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>        at java.util.ArrayList.get(ArrayList.java:322)
>>        at
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.initPointerIterator(FSIndexRepositoryImpl.java:628)
>>
>>        at
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.<init>(FSIndexRepositoryImpl.java:636)
>>
>>        at
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.<init>(FSIndexRepositoryImpl.java:612)
>>
>>        at
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.createPointerIterator(FSIndexRepositoryImpl.java:158)
>>
>>        at
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl$IndexImpl.iterator(FSIndexRepositoryImpl.java:792)
>>
>>        at
>> org.apache.uima.cas.impl.AnnotationIndexImpl.iterator(AnnotationIndexImpl.java:97)
>>
>>        at
>> fr.lipn.uima.testing.TestConcurrentCASAccesAE.getFSIterator(TestConcurrentCASAccesAE.java:59)
>>
>>
>> I managed to isolate the problem and wrote a simple AE to explain/show
>> it (attached).
>>
>> Thanks for your help (and sorry if I missed something in the doc !)
>>
>> Erwan
>>
>>
>>   
>>     

Reply via email to