[
https://issues.apache.org/jira/browse/LUCENE-3731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13208129#comment-13208129
]
Steven Rowe commented on LUCENE-3731:
-------------------------------------
Hi Tommaso,
I just committed modifications to the IntelliJ IDEA and Maven configurations.
Something strange is happening, though: one test method consistently fails
under both IntelliJ and Maven: {{UIMABaseAnalyzerTest.testRandomStrings()}}.
However, under Ant, this always succeeds, including with the seeds that fail
under either IntelliJ or Maven. Also, under both IntelliJ and Maven, the
following sequence is printed out literally thousands of times to STDERR (with
increasing time stamps) - however, I don't see this at all under Ant:
{noformat}
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer initialize
INFO: "Whitespace tokenizer successfully initialized"
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer typeSystemInit
INFO: "Whitespace tokenizer typesystem initialized"
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer starts processing"
Feb 14, 2012 6:34:18 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer finished processing"
{noformat}
Here are two different example failures, from Maven - they seem to have
different causes, which is baffling:
{noformat}
The following exceptions were thrown by threads:
*** Thread: Thread-1 ***
java.lang.RuntimeException: java.io.IOException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator
processing failed.
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:289)
Caused by: java.io.IOException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException: Annotator
processing failed.
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:73)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:333)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException:
Annotator processing failed.
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:391)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at
org.apache.lucene.analysis.uima.BaseUIMATokenizer.analyzeInput(BaseUIMATokenizer.java:57)
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.analyzeText(UIMAAnnotationsTokenizer.java:61)
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:71)
... 3 more
Caused by: java.lang.NullPointerException
at
org.apache.uima.impl.UimaContext_ImplBase$ComponentInfoImpl.mapToSofaID(UimaContext_ImplBase.java:655)
at org.apache.uima.cas.impl.CASImpl.getView(CASImpl.java:2646)
at org.apache.uima.jcas.impl.JCasImpl.getView(JCasImpl.java:1415)
at org.apache.uima.examples.tagger.HMMTagger.process(HMMTagger.java:250)
at
org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48)
at
org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
... 12 more
*** Thread: Thread-2 ***
java.lang.AssertionError: token 0 does not exist
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:121)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
NOTE: reproduce with: ant test -Dtestcase=UIMABaseAnalyzerTest
-Dtestmethod=testRandomStrings(org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest)
-Dtests.seed=-1dad4a7ede576939:-f9f5c77dffb3eb0:607bf59bf7da50eb
-Dargs="-Dfile.encoding=Cp1252"
{noformat}
{noformat}
The following exceptions were thrown by threads:
*** Thread: Thread-5 ***
java.lang.RuntimeException: java.io.IOException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:289)
Caused by: java.io.IOException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:73)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:121)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:701)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at
org.apache.lucene.analysis.uima.BaseUIMATokenizer.analyzeInput(BaseUIMATokenizer.java:57)
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.analyzeText(UIMAAnnotationsTokenizer.java:61)
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:71)
... 4 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 2
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at
org.apache.uima.flow.impl.FixedFlowController$FixedFlowObject.next(FixedFlowController.java:222)
at
org.apache.uima.analysis_engine.asb.impl.FlowContainer.next(FlowContainer.java:100)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:546)
... 11 more
*** Thread: Thread-7 ***
java.lang.RuntimeException: java.io.IOException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:289)
Caused by: java.io.IOException:
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:73)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:121)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
Caused by: org.apache.uima.analysis_engine.AnalysisEngineProcessException
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:701)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.<init>(ASB_impl.java:409)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342)
at
org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at
org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267)
at
org.apache.lucene.analysis.uima.BaseUIMATokenizer.analyzeInput(BaseUIMATokenizer.java:57)
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.analyzeText(UIMAAnnotationsTokenizer.java:61)
at
org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer.incrementToken(UIMAAnnotationsTokenizer.java:71)
... 4 more
Caused by: java.lang.IndexOutOfBoundsException: Index: 0, Size: 2
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at
org.apache.uima.flow.impl.FixedFlowController$FixedFlowObject.next(FixedFlowController.java:222)
at
org.apache.uima.analysis_engine.asb.impl.FlowContainer.next(FlowContainer.java:100)
at
org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:546)
... 11 more
*** Thread: Thread-6 ***
java.lang.AssertionError: end of stream
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertFalse(Assert.java:68)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:148)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
*** Thread: Thread-4 ***
org.junit.ComparisonFailure: term 8 expected:<-[]> but was:<-[- f(]>
at org.junit.Assert.assertEquals(Assert.java:125)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.assertTokenStreamContents(BaseTokenStreamTestCase.java:124)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:371)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:295)
at
org.apache.lucene.analysis.BaseTokenStreamTestCase$AnalysisThread.run(BaseTokenStreamTestCase.java:287)
NOTE: reproduce with: ant test -Dtestcase=UIMABaseAnalyzerTest
-Dtestmethod=testRandomStrings(org.apache.lucene.analysis.uima.UIMABaseAnalyzerTest)
-Dtests.seed=2be0c24a1df9b25e:-42f203968285c6ed:5f8c85cdbae32724
-Dargs="-Dfile.encoding=Cp1252"
{noformat}
> Create a analysis/uima module for UIMA based tokenizers/analyzers
> -----------------------------------------------------------------
>
> Key: LUCENE-3731
> URL: https://issues.apache.org/jira/browse/LUCENE-3731
> Project: Lucene - Java
> Issue Type: Improvement
> Components: modules/analysis
> Reporter: Tommaso Teofili
> Assignee: Tommaso Teofili
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3731.patch, LUCENE-3731_2.patch,
> LUCENE-3731_3.patch, LUCENE-3731_4.patch
>
>
> As discussed in SOLR-3013 the UIMA Tokenizers/Analyzer should be refactored
> out in a separate module (modules/analysis/uima) as they can be used in plain
> Lucene. Then the solr/contrib/uima will contain only the related factories.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]