Type Priorities (was: Retire UIMA C++ SDK)

Richard Eckart de Castilho Fri, 06 Jan 2023 06:47:20 -0800


> On 6. Jan 2023, at 14:53, Pablo Duboue <[email protected]> wrote:
> 
>> Note that Cassis does not support indices or type priorities. To be
>> honest, those always seemed to be more in the way than helpful anyway. The
>> UIMAv3 select API by also default ignores type priorities (can be turned on
>> though for a given select call).
>> 
> 
> Type priorities were indeed a rare bird. But type indices are mighty
> useful. So UIMAv3 has no indices at all? Getting an iterator over
> annotations that fall inside another annotation is a very common task
> (sentences within paragraphs, tokens within sentences, etc). It is one of
> the few constructs that other NLP frameworks provide.


UIMAv3 still has indices, but as in UIMAv2, one normally does not have to 
configure them.
UIMA (Java) automatically creates indices for all subtypes of Annotation. Also, 
there is a general index for all FeatureStructures. The same is true for Cassis.

MySentence and MyNER in your code appear to be subtypes of Annotation and you 
don't seem to define any keys in addition/other than begin/end, so an index 
definition should not be required.

Defining custom indices would only be required e.g. if you need to set up 
different index keys. The select-API of UIMAv3 is aware of the 
automatically-created Annotation-subtype indices and uses them to perform fast 
seeks with respect to annotation begin/end. However, the select-API is not 
aware of custom indices and will not use them to speed up access. 
In my experience tough, most access is well-scoped via offsets (fast through 
the Annotation indices) and then a filter() statement can be used to further 
narrow down with a O(n) complexity.

-- Richard

Type Priorities (was: Retire UIMA C++ SDK)

Reply via email to