I agree with Matt, I actually tried the below but still couldn't get consistent expected results from UIMA's subiterator: 1) Having uimaFit (Tested with uimaFIT v.1.3/ UIMA v. 2.4.0) create an Aggregate with a specific type order (even outputting the xml meta data to ensure that Type Priorities are picked up). 2) Adding the Type Priorities to the individual annotator descriptor xml and have uimaFit createAnalsyisEngine from xml Path.
I think it's actually how it's stored in the FSindexes internally within UIMA (which seem to have some known documented limitations). Still unfruitful and frustrated with debugging UIMA, I just decided to use uimaFIT which seems to consistently produce the desired results :). Has anyone thought about contributing uimaFIT back to UIMA? -----Original Message----- From: Coarr, Matt [mailto:[email protected]] Sent: Thursday, August 09, 2012 11:34 AM To: [email protected] Subject: Re: UIMA's subiterator I think that regardless of the APIs used (uimafit, cleartk, out-of-the-box uima), in order to work with the annotations properly, it's important to have the type hierarchy defined ("the ordering of types"). In the clinical documents pipeline (cdp), this is defined in the cdp xml analysis engine descriptor. In uimaFIT, I think this is already supported. You just need to supply the TypePriorities object in the following factory method (you're probably already using this method and just need to add the right value for the type priorities): org.uimafit.factory.AnalysisEngineFactory.createAggregate() http://uimafit.googlecode.com/svn/tags/uimafit-parent-1.2.0/apidocs/org/uim afit/factory/AnalysisEngineFactory.html Matt On 2012-08-09 11:18 , "Chen, Pei" <[email protected]> wrote: >To get all the BaseTokens for a particular sentence, if we use the >.subiterator, the types has be stored in the FSindexes in a certain >order otherwise it could just return an empty list. This would require >the users of annotators to understand the ordering of types and have it >preconfigured. > >FSIterator<Annotation> tokensInSentenceIterator = >jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence); > >uimaFIT already created a convenience method that seems to do something >similar which will always return the expected tokens. Does anyone know >if this was part of the motivation? Is the performance hit (if any) >worth the ease of use? >Ex: >List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas, >BaseToken.class, sentence); >Another alternative is UIMA's FilteredIterator. > >There are a few places that use subiterator in cTAKES and it's tempting >to use uimaFIT's JCasUtil.selecteCovered() instead... What do others >think? > >Background: This issue surfaced when we use the cTAKES GUI (which uses >uimaFIT to wire the components together instead of the Aggregate XML >descriptor). > >--Pei >
