> Seems that an easy work-around would be to have your reader and writer > threads synchronize on their access to the CAS. If we implemented > concurrent access, this is what we would have to do, inside the CAS > itself. > > When new data are added to the CAS, indexes are often updated. If these > are concurrently being accessed, *bad things* can happen, which is > probably what's happening in your case. > > Well, not exactly because I do not *write* any data in the CAS: threads only read the annotations contained in the CAS, and in my real annotators data is written in the CAS after all threads have terminated. I'm not expert in thread-safety so I might miss something, but at first sight I don't understand how concurrent read access can fail? (though I must admit I did not try to study the source code in the FSIndexRepositoryImpl class)
> The CAS is used as a "unit-of-work" in many places in UIMA, as well. If > you used it for this purpose, then a workflow might be: > > Have the Writer write to the process, so the process gets all its > inputs, then have the reader read from the process the results. > > For scale-out, have multiple CASes. > > Would this work in your use case? -Marshall > Yes, indeed. The only quite negative point in this solution is that it requires to totally duplicate the data at each input or output step, thus needing a bit more time and memory. I guess this solution is more "UIMA standard" than synchronizing every CAS access in my threads? Thanks again! Erwan
