Our docs say that AE's are run in a single thread model (see
http://uima.apache.org/d/uimaj-2.4.0/tutorials_and_users_guides.html#ugr.tug.aae.contract_for_annotator_methods).
If multiple threads are wanted, the framework supports this by making multiple
instances of the AE's implementation class. This limits "thread-safety" issues
to only "static" or class-level fields.
The reason for this was an observation that the people writing annotators,
although skilled in their particular discipline and able to write code that
extracted information from Unstructured data, did not typically have the skills
needed to write correct multi-threaded implementations in Java. So the
framework "helped" here, by insuring that any parallelism the framework
supported created multiple instances of the annotator class, for each thread.
I believe, however, that it is currently possible to use the framework in ways
in which the application writer creates multiple threads and calls the same
annotator instance on multiple threads at the same time. Perhaps a proper
approach here would be to have the framework detect this, and signal some kind
of error.
-Marshall
On 2/17/2012 4:09 AM, Tommaso Teofili (Commented) (JIRA) wrote:
[
https://issues.apache.org/jira/browse/UIMA-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13210151#comment-13210151
]
Tommaso Teofili commented on UIMA-2373:
---------------------------------------
bq. Possibly a concurrency issue?
Yes, I think so.
That came out when an AE is used from different clients which execute in
parallel, so I wonder if is the usage which is wrong or we should allow that
and thus made a fix for it.
Possible bug in FixedFlowController
-----------------------------------
Key: UIMA-2373
URL: https://issues.apache.org/jira/browse/UIMA-2373
Project: UIMA
Issue Type: Bug
Affects Versions: 2.4.0SDK
Reporter: Tommaso Teofili
I am developing a series of Lucene tokenizers which can use UIMA for creating
tokens via extracted annotations.
While doing a stress test with lots of different strings I experienced the
following:
{noformat}
[junit] Testsuite: org.apache.lucene.analysis.uima.UIMATypeAwareAnalyzerTest
[junit] Tests run: 2, Failures: 0, Errors: 1, Time elapsed: 92,061 sec
[junit]
[junit] ------------- Standard Error -----------------
[junit] The following exceptions were thrown by threads:
[junit] *** Thread: Thread-9 ***
[junit] java.lang.RuntimeException: java.io.IOException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
[junit] at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:289)
[junit] Caused by: java.io.IOException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
[junit] at
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.incrementToken(UIMATypeAwareAnnotationsTokenizer.java:87)
[junit] at
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:121)
[junit] at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
[junit] at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
[junit] at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
[junit] Caused by:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
[junit] at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:701)
[junit] at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
[junit] at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
[junit] at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
[junit] at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
[junit] at
org.apache.lucene.analysis.uima.BaseUIMATokenizer.analyzeInput(BaseUIMATokenizer.java:57)
[junit] at
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.analyzeText(UIMATypeAwareAnnotationsTokenizer.java:73)
[junit] at
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer.incrementToken(UIMATypeAwareAnnotationsTokenizer.java:85)
[junit] ... 4 more
[junit] Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 2
[junit] at java.util.ArrayList.RangeCheck(ArrayList.java:547)
[junit] at java.util.ArrayList.get(ArrayList.java:322)
[junit] at
org.apache.uima.flow.impl.FixedFlowController$FixedFlowObject.next(FixedFlowController.java:216)
[junit] at
org.apache.uima.analysis_engine.asb.impl.FlowContainer.next(FlowContainer.java:98)
[junit] at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:667)
[junit] ... 11 more
{noformat}
I'm debugging it and see if I can come up with the exact bug (and fix) :)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira