RE: UIMA's subiterator

Chen, Pei Thu, 09 Aug 2012 08:50:48 -0700

I agree with Matt, I actually tried the below but still couldn't get consistent 
expected results from UIMA's subiterator:
1) Having uimaFit (Tested with uimaFIT v.1.3/ UIMA v. 2.4.0) create an 
Aggregate with a specific type order (even outputting the xml meta data to 
ensure that Type Priorities are picked up).
2) Adding the Type Priorities to the individual annotator descriptor xml and 
have uimaFit createAnalsyisEngine from xml Path.

I think it's actually how it's stored in the FSindexes internally within UIMA 
(which seem to have some known documented limitations).
Still unfruitful and frustrated with debugging UIMA, I just decided to use 
uimaFIT which seems to consistently produce the desired results :).  

Has anyone thought about contributing uimaFIT back to UIMA?

-----Original Message-----
From: Coarr, Matt [mailto:[email protected]] 
Sent: Thursday, August 09, 2012 11:34 AM
To: [email protected]
Subject: Re: UIMA's subiterator

I think that regardless of the APIs used (uimafit, cleartk, out-of-the-box 
uima), in order to work with the annotations properly, it's important to have 
the type hierarchy defined ("the ordering of types").  In the clinical 
documents pipeline (cdp), this is defined in the cdp xml analysis engine 
descriptor.

In uimaFIT, I think this is already supported. You just need to supply the 
TypePriorities object in the following factory method (you're probably already 
using this method and just need to add the right value for the type priorities):

org.uimafit.factory.AnalysisEngineFactory.createAggregate()

http://uimafit.googlecode.com/svn/tags/uimafit-parent-1.2.0/apidocs/org/uim
afit/factory/AnalysisEngineFactory.html

Matt

On 2012-08-09 11:18 , "Chen, Pei" <[email protected]> wrote:

>To get all the BaseTokens for a particular sentence, if we use the 
>.subiterator, the types has be stored in the FSindexes in a certain 
>order otherwise it could just return an empty list.  This would require 
>the users of annotators to understand the ordering of types and have it 
>preconfigured.
>
>FSIterator<Annotation> tokensInSentenceIterator = 
>jcas.getAnnotationIndex(BaseToken.type).subiterator(sentence);
>
>uimaFIT already created a convenience method that seems to do something 
>similar which will always return the expected tokens.  Does anyone know 
>if this was part of the motivation?  Is the performance hit (if any) 
>worth the ease of use?
>Ex:
>List<BaseToken> tokens = org.uimafit.util.JCasUtil.selectCovered(jCas,
>BaseToken.class, sentence);
>Another alternative is UIMA's FilteredIterator.
>
>There are a few places that use subiterator in cTAKES and it's tempting 
>to use uimaFIT's JCasUtil.selecteCovered() instead... What do others 
>think?
>
>Background: This issue surfaced when we use the cTAKES GUI (which uses 
>uimaFIT to wire the components together instead of the Aggregate XML 
>descriptor).
>
>--Pei
>

RE: UIMA's subiterator

Reply via email to