Corrected the subject ;) ... and added a section > On 6. Jan 2023, at 14:53, Pablo Duboue <pablo.dub...@gmail.com> wrote: > >> Note that Cassis does not support indices or type priorities. To be >> honest, those always seemed to be more in the way than helpful anyway. The >> UIMAv3 select API by also default ignores type priorities (can be turned on >> though for a given select call). >> > > Type priorities were indeed a rare bird. But type indices are mighty > useful. So UIMAv3 has no indices at all? Getting an iterator over > annotations that fall inside another annotation is a very common task > (sentences within paragraphs, tokens within sentences, etc). It is one of > the few constructs that other NLP frameworks provide.
UIMAv3 still has indices, but as in UIMAv2, one normally does not have to configure them. UIMA (Java) automatically creates indices for all subtypes of Annotation. Also, there is a general index for all FeatureStructures. The same is true for Cassis. MySentence and MyNER in your code appear to be subtypes of Annotation and you don't seem to define any keys in addition/other than begin/end, so an index definition should not be required. Defining custom indices would only be required e.g. if you need to set up different index keys. The select-API of UIMAv3 is aware of the automatically-created Annotation-subtype indices and uses them to perform fast seeks with respect to annotation begin/end. However, the select-API is not aware of custom indices and will not use them to speed up access. In my experience tough, most access is well-scoped via offsets (fast through the Annotation indices) and then a filter() statement can be used to further narrow down with a O(n) complexity. >> == Component concept >> >> The Python annotation with component metadata on the analysis engine class >> looks interesting. I wonder if you need the indexes though. Can you not >> work simply with the built-in annotation index? >> > > Wouldn't that be slow? Iterate over thousands of annotations for only a few > paragraph annotations? At any rate UIMA CPP has the indices so it'll go > very fast. See above. UIMA Java / Cassis do have indices. UIMA Java has also custom indices, but normally they are not needed. Cassis does currently not support custom indices. -- Richard