Actually, I just tried it with the Annotation Printer instead of Solrcas and it got the same exception. I will back up and troubleshoot this by inspecting the output of the MetaMapApiAE.
Dave On Thu, Feb 3, 2011 at 12:06 PM, David Thibault <[email protected]>wrote: > Hello all, > > First off, I apologize for sending this to both the user and dev lists, but > I'm not sure which list should get it. This is my first email to either > list. > > I am working with UIMA and Solrcas and I'm getting this error: > org.apache.uima.analysis_engine.AnalysisEngineProcessException > at > org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138) > at > org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) > at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) > at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) > at > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) > at > org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897) > at > org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577) > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -1 > at java.lang.String.substring(String.java:1931) > at > org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119) > at > org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126) > ... 6 more > org.apache.uima.analysis_engine.AnalysisEngineProcessException > at > org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:138) > at > org.apache.uima.analysis_component.JCasAnnotator_ImplBase.process(JCasAnnotator_ImplBase.java:48) > at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) > at > org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) > at > org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) > at > org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.processNext(ProcessingUnit.java:897) > at > org.apache.uima.collection.impl.cpm.engine.ProcessingUnit.run(ProcessingUnit.java:577) > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -1 > at java.lang.String.substring(String.java:1931) > at > org.apache.uima.jcas.tcas.Annotation.getCoveredText(Annotation.java:119) > at > org.apache.uima.solrcas.SolrCASConsumer.process(SolrCASConsumer.java:126) > ... 6 more > > I edited SolrCASConsumer with the following lines right before line 126: > Annotation fsTemp = (Annotation) fs; > System.out.println("Processing Annotation: " + fsTemp.toString()); > > Therefore, now right before it calls fs.getCoveredText() it prints this: > Processing Annotation: Phrase > sofa: _InitialView > begin: -1 > end: 60 > candidates: FSArray > mappings: FSArray > > Therefore, it's obvious why it's saying the string index is out of bounds. > However, I'm not sure why it's getting those values from my analysis > engine. I'm using MetaMapAEApi from the NIH's MetaMap project. > > This is the first phrase it is processing on this document and the first > time int prints that subsection of debug tex. If I use the same AE in > DocumentAnalyzer it correctly shows the first Document as starting on > position 0 and ending on position 191, with the first phrase as being from > positions 0 to 7. > > I'm trying to run this in the CPE GUI with the following CPEDescriptor.xml: > <?xml version="1.0" encoding="UTF-8"?> > <cpeDescription xmlns="http://uima.apache.org/resourceSpecifier"> > <collectionReader> > <collectionIterator> > <descriptor> > <import > location="../../../../../../../usr/local/apache-uima/examples/descriptors/collection_reader/FileSystemCollectionReader.xml"/> > </descriptor> > <configurationParameterSettings> > <nameValuePair> > <name>InputDirectory</name> > <value> > > <string>/Users/davidt/Documents/workspace/BioSearch/resources/test_input</string> > </value> > </nameValuePair> > </configurationParameterSettings> > </collectionIterator> > </collectionReader> > <casProcessors casPoolSize="3" processingUnitThreadCount="1"> > <casProcessor deployment="integrated" name="MetaMapApiAE"> > <descriptor> > <import location="../../../MetaMap UIMA > Annotator/descriptors/MetaMapApiAE.xml"/> > </descriptor> > <deploymentParameters/> > <errorHandling> > <errorRateThreshold action="terminate" value="0/1000"/> > <maxConsecutiveRestarts action="terminate" value="30"/> > <timeout max="100000" default="-1"/> > </errorHandling> > <checkpoint batch="10000" time="1000ms"/> > <configurationParameterSettings> > <nameValuePair> > <name>tempdir_path</name> > <value> > <string>/Users/davidt/tmp</string> > </value> > </nameValuePair> > </configurationParameterSettings> > </casProcessor> > <casProcessor deployment="integrated" name="SolrcasAE.xml"> > <descriptor> > <import > location="../../../Apache_UIMA_Sandbox/Solrcas/desc/SolrcasAE.xml"/> > </descriptor> > <deploymentParameters/> > <errorHandling> > <errorRateThreshold action="terminate" value="0/1000"/> > <maxConsecutiveRestarts action="terminate" value="30"/> > <timeout max="100000" default="-1"/> > </errorHandling> > <checkpoint batch="10000" time="1000ms"/> > </casProcessor> > </casProcessors> > <cpeConfig> > <numToProcess>-1</numToProcess> > <deployAs>immediate</deployAs> > <checkpoint batch="0" time="300000ms"/> > <timerImpl/> > </cpeConfig> > </cpeDescription> > > I'm at a loss as to where that -1 is coming from or how to debug it > further. Any ideas would be greatly appreciated. > > Best, > Dave > >
