I think that regardless of the APIs used (uimafit, cleartk, out-of-the-box
uima), in order to work with the annotations properly, it's important to
have the type hierarchy defined ("the ordering of types").  In the
clinical documents pipeline (cdp), this is defined in the cdp xml analysis
engine descriptor.

In uimaFIT, I think this is already supported. You just need to supply the
TypePriorities object in the following factory method (you're probably
already using this method and just need to add the right value for the
type priorities):

org.uimafit.factory.AnalysisEngineFactory.createAggregate()

http://uimafit.googlecode.com/svn/tags/uimafit-parent-1.2.0/apidocs/org/uim
afit/factory/AnalysisEngineFactory.html


Matt


On 2012-08-09 11:18 , "Chen, Pei" <[email protected]> wrote:

>To get all the BaseTokens for a particular sentence, if we use the
>.subiterator, the types has be stored in the FSindexes in a certain order
>otherwise it could just return an empty list.  This would require the
>users of annotators to understand the ordering of types and have it
>preconfigured.
>
>FSIterator<Annotation> tokensInSentenceIterator =
>jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
>
>uimaFIT already created a convenience method that seems to do something
>similar which will always return the expected tokens.  Does anyone know
>if this was part of the motivation?  Is the performance hit (if any)
>worth the ease of use?
>Ex:
>List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas,
>BaseToken.class, sentence);
>Another alternative is UIMA's FilteredIterator.
>
>There are a few places that use subiterator in cTAKES and it's tempting
>to use uimaFIT's JCasUtil.selecteCovered() instead... What do others
>think?
>
>Background: This issue surfaced when we use the cTAKES GUI (which uses
>uimaFIT to wire the components together instead of the Aggregate XML
>descriptor).
>
>--Pei
>

Reply via email to