I think a UIMA subiterator make it too easy to introduce bugs. I have never used org.uimafit.util.JCasUtil.selectCovered, but I prefer the idea of that to a subiterator - provided the speed is not too bad. For example, given the warning in the javadocs about the speed of selectCovered(JCas, Class, int, int) I suggest we stay away from that but definitely try the one you suggested: selectCovered(JCas, Class, Annotation)
Regards, James Masanz > -----Original Message----- > From: ctakes-dev-return-221- > [email protected] [mailto:ctakes-dev-return- > [email protected]] On Behalf Of Chen, > Pei > Sent: Thursday, August 09, 2012 10:18 AM > To: [email protected] > Subject: UIMA's subiterator > > To get all the BaseTokens for a particular sentence, if we use the .subiterator, > the types has be stored in the FSindexes in a certain order otherwise it could > just return an empty list. This would require the users of annotators to > understand the ordering of types and have it preconfigured. > > FSIterator<Annotation> tokensInSentenceIterator = > jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence); > > uimaFIT already created a convenience method that seems to do something > similar which will always return the expected tokens. Does anyone know if > this was part of the motivation? Is the performance hit (if any) worth the > ease of use? > Ex: > List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas, > BaseToken.class, sentence); Another alternative is UIMA's FilteredIterator. > > There are a few places that use subiterator in cTAKES and it's tempting to use > uimaFIT's JCasUtil.selecteCovered() instead... What do others think? > > Background: This issue surfaced when we use the cTAKES GUI (which uses > uimaFIT to wire the components together instead of the Aggregate XML > descriptor). > > --Pei
