I think I may have found a way to have the individual pipelines created by the MultiprocessingAnalysisEngine stop sharing the UIMA contexts. This would alleviate the CasMultiplier issue, but at a cost of changing the behavior for existing users of this facility - in that "external resources" managed by UIMA would no longer be shared across the multiple pipelines.
An alternative would be to document this behavior and warn against using this facility with Cas Multipliers. Any preferences? I think I'm slightly in favor of no longer implicitly sharing UIMA Context across multiple pipelines. -Marshall On 3/14/2011 11:17 AM, Marshall Schor wrote: > On 3/14/2011 11:02 AM, Marshall Schor wrote: >> Adam noted that the issue https://issues.apache.org/jira/browse/UIMA-2078 >> suggests there are other issues around Cas Multipliers in base UIMA, when >> using >> shared UIMA Contexts. >> >> This is because the getEmptyCas method in the (shared) UimaContext is >> checking >> to see if the pool size is exceeded, and if the pool size is 1 but you have 5 >> pipelines sharing the UimaContext, this test would result in throwing an >> exception on the 2nd one. >> >> So, one approach to "fix" this problem would be to not share UimaContexts, in >> this case. The downside of this would be that the contexts were not shared >> among the pipelines. This could be a good or bad thing, depending on the use >> case(s). >> >> Good thing: The contexts are very large, but read-only. >> >> Bad thing: The contexts are used by the pipeline for things like storing data >> via the external resource manager, in structures like HashMaps, which are not >> thread safe. The user could have designed a pipe line where some upstream >> annotators wrote some data into a map, and some downstream annotators later >> accessed that data, presuming what it would see would be just what the >> upstream >> annotator put there. In the case of shared UIMA-Contexts, besides the issue >> of >> thread safety for HashMaps, even if the user used a thread-safe version of >> this, >> this presumption would not hold. >> >> There is a built-in framework method (produceAnalysisEngine with 2 int >> arguments) that instantiates multiple analysis engines >> (MultiprocessingAnalysisEngine_impl), used by (for example) the SOAP service >> adapter. In light of this, it seems faulty in several ways. > Vinci services also make use of the MultiprocessingAnalysisEngine class. > > -Marshall >> 1) MultiprocessingAnalysisEngine_impl shares the UimaContext among the pool >> of >> resources, and has the above issues including the CasMultiplier / pool size >> issue. >> >> 2) When UIMA-AS was being debugged, one issue that came up was that some >> annotators had been written with a presumption that the thread used to call >> the >> initialize method needed to be the same thread used to call the process >> method >> (these annotators made use of ThreadLocal variables, IIRC). >> See https://issues.apache.org/jira/browse/UIMA-1223 . UIMA-AS was updated to >> insure in its multi-pipeline setup that this presumption was met. >> >> Shouldn't this same presumption be met with the base UIMA implementation? >> >> -Marshall >> >> >> >> >> >> >> >
