On 6/16/2010 19:22, Erwan Moreau wrote:
Seems that an easy work-around would be to have your reader and writer
threads synchronize on their access to the CAS. If we implemented
concurrent access, this is what we would have to do, inside the CAS
itself.
When new data are added to the CAS, indexes are often updated. If these
are concurrently being accessed, *bad things* can happen, which is
probably what's happening in your case.
Well, not exactly because I do not *write* any data in the CAS: threads
only read the annotations contained in the CAS, and in my real
annotators data is written in the CAS after all threads have terminated.
I'm not expert in thread-safety so I might miss something, but at first
sight I don't understand how concurrent read access can fail? (though I
must admit I did not try to study the source code in the
FSIndexRepositoryImpl class)
I agree, this should be possible. I'll take a look sometime
when our build has stabilized.
It may have to do with the way our internal iterator cache
works. What you could try to do is this: create one iterator
of every type you're interested in, in a sequential manner.
You don't need to use them. Then try your concurrent access
again. No guarantees though, I didn't even look at the code.
--Thilo
The CAS is used as a "unit-of-work" in many places in UIMA, as well. If
you used it for this purpose, then a workflow might be:
Have the Writer write to the process, so the process gets all its
inputs, then have the reader read from the process the results.
For scale-out, have multiple CASes.
Would this work in your use case? -Marshall
Yes, indeed. The only quite negative point in this solution is that it
requires to totally duplicate the data at each input or output step,
thus needing a bit more time and memory. I guess this solution is more
"UIMA standard" than synchronizing every CAS access in my threads?
Thanks again!
Erwan