UIMA's subiterator

Chen, Pei Thu, 09 Aug 2012 08:18:53 -0700

To get all the BaseTokens for a particular sentence, if we use the 
.subiterator, the types has be stored in the FSindexes in a certain order 
otherwise it could just return an empty list.  This would require the users of 
annotators to understand the ordering of types and have it preconfigured.


FSIterator<Annotation> tokensInSentenceIterator = 
jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);

uimaFIT already created a convenience method that seems to do something similar 
which will always return the expected tokens.  Does anyone know if this was 
part of the motivation?  Is the performance hit (if any) worth the ease of use?
Ex:
List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas, 
BaseToken.class, sentence);
Another alternative is UIMA's FilteredIterator.

There are a few places that use subiterator in cTAKES and it's tempting to use 
uimaFIT's JCasUtil.selecteCovered() instead... What do others think?

Background: This issue surfaced when we use the cTAKES GUI (which uses uimaFIT 
to wire the components together instead of the Aggregate XML descriptor).

--Pei

UIMA's subiterator

Reply via email to