Hi, Thanks for the answer.
> Hi, > > The CAS is not designed for concurrent access, to my knowledge, but > perhaps others can comment more on this. > I'd like to know more about that, because imho this is a quite strong limitation: maybe naively, I used to think that using concurrent access only for reading was safe, since most concurrency problems occur when threads can also write the shared object? > Most scale-out use-cases are designs which also scale out the CASes. We > would be interested in hearing about a use case which motivates > multi-threaded access to a single CAS. > Indeed, my use-case probably does not correspond to what UIMA is intended for. I must explain a bit the context: we are actually building wrapper annotators for external programs called through a ProcessBuilder object (yes, the dirty "exec" call). We are aware of the problems that this implies, and ideally we would have re-coded our tools from scratch as UIMA annotators or used C++ framework. Nevertheless we decided that was the best choice, because our team owns a few complex NLP tools which are the core of our work and would be very costly to migrate; so we want to provide quite quickly a way to use them in a UIMA environment so that people start using UIMA when creating higher level components (and maybe these core components will be migrated later). In this context, we try to provide an "as safe and efficient as possible" framework in which these programs are called inside an annotator. That is why we use threads to provide the input stream and read the output stream. In order to avoid wasting time and space, our threads use Reader and Writer objects so that data is transmitted on the fly to/from the process (inside the process method). Thus concurrent access to the CAS is required when the Writer object that provides the stdin stream is still reading annotations, while the Reader object has already started to re-align the program output with the CAS content. Of course no concurrency problem occurs if the input/output are transmitted as simple String objets or as files, but that is clearly less efficient (and not safer, as far as i know). I don't know whether there can be more standard use-cases using threads. Nevertheless the problem would be the same if the black box was not an external program but any piece of code that can not be modified and behaves like a pipe. Erwan > -Marshall > > On 6/15/2010 1:35 PM, Erwan Moreau wrote: > >> Hello, >> >> I experience problems using several threads which read annotations in >> the same (default) CAS index, inside the same call to the process >> method. Since I'm new to UIMA I'm not sure how to interpret that: normal >> behaviour due to wrong usage or bug ? The exception stack is: >> >> java.lang.IndexOutOfBoundsException: Index: 0, Size: 3 >> at java.util.ArrayList.RangeCheck(ArrayList.java:547) >> at java.util.ArrayList.get(ArrayList.java:322) >> at >> org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.initPointerIterator(FSIndexRepositoryImpl.java:628) >> >> at >> org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.<init>(FSIndexRepositoryImpl.java:636) >> >> at >> org.apache.uima.cas.impl.FSIndexRepositoryImpl$LeafPointerIterator.<init>(FSIndexRepositoryImpl.java:612) >> >> at >> org.apache.uima.cas.impl.FSIndexRepositoryImpl.createPointerIterator(FSIndexRepositoryImpl.java:158) >> >> at >> org.apache.uima.cas.impl.FSIndexRepositoryImpl$IndexImpl.iterator(FSIndexRepositoryImpl.java:792) >> >> at >> org.apache.uima.cas.impl.AnnotationIndexImpl.iterator(AnnotationIndexImpl.java:97) >> >> at >> fr.lipn.uima.testing.TestConcurrentCASAccesAE.getFSIterator(TestConcurrentCASAccesAE.java:59) >> >> >> I managed to isolate the problem and wrote a simple AE to explain/show >> it (attached). >> >> Thanks for your help (and sorry if I missed something in the doc !) >> >> Erwan >> >> >> >>
