This area seems to be somewhat complex, with some design decision critera which I don't know (perhaps others know).
UIMA supports several kinds of indexes: Bag (duplicates allowed, key == the featureStructure itself), Set (no dups, defined keys used to establish "equality" for set exclusion), and sorted (may have duplicates, and and there are two sub-kinds: those sorted by keys, and type-priority (no keys). Internally, each index is over one type. When actually iterating over a type, the iteration includes subtypes of that type, so iterators actually work by using multiple (single-type) indexes at once, and give the effect of iterating over a type and all its subtypes. Defining an index for type T automatically defines it for all subtypes of T as well. The add-to-index support was modified back in version 2.1 to automatically declare a bag index, lazily, if a type is being added to an index and no index exists (via the merged index definitions) for that type. Now, the first bit of ambiguity for "getAllIndexedFSs" arises in this use case: a user declares a set index for type T. Because there's a set index, the auto-creation of a bag index for type T is not done. The set index could define a key which results in all feature structures of type T being mapped to the same key, with the result that only the first one will be in the index. So, even though multiple feature structures of type T were created, UIMA only has the first one indexed (but presumably, this was the intent), and only 1 will be returned. The 2nd bit of ambiguity: if there is a sorted or bag index defined (or if no index is defined - meaning that there will be an automatically generated bag index defined lazily), then if the same identical feature structure is added to the indexes multiple times, it will be in the index multiple times (unless it is removed from the index). If a feature structure is added 5 times and removed 2 times, then it's still in the bag / sorted index 3 times. Now, to answer your specific question: >> Is there some default index for each type quaranteed to be always there with a well-known label which can be used with cas.getIndexRepository().getIndex(label); in analogy to getAnnotationIndex(type)? Not currently, because such an index doesn't actually necessarily exist. The AnnotationIndex is one, specific index that seemed to be of interest sufficiently often that it was "built-in". Here are some issues with dynamically constructing such an index (on first request, for instance). It would be dependent on the set of existing indexes at the time. Since (currently) the default-bag-indexes are constructed lazily during the add-to-indexes operation, if no type T was added-to-indexes (and no index over type T or any of its supertypes was defined), there would be no default index of type T so that would not be in the constructed index. If later, someone did an add-to-indexes operation on type T, this would change, and now the dynamically constructed index would be wrong, and need to be redone. This could be avoided by forcing the creation (early) of missing default indexes, of course, but that could be quite inefficient. Next, this would be a strange new kind of index, with different types of indexes operating at different levels of the type hierarchy. For instance, the type asked for may be a default-bag-index, whereas a subtype might be a set index. I'm guessing the UIMA code may not support this kind of index without some rework. Furthermore, the FSIterator returned by the getAllIndexedFS(type) method may behave in somewhat unexpected ways, depending on what indexes have been defined for the type and all of its subtypes. If only "set" indexes have been defined for some subtype, ST, of <type>, then only the FS of that ST which were "unique" in the set definition will be found; if *no* indexes were defined for <type>, then all FSs (with possibly duplicates, if the same FS was added to indexes multiple times) would be returned. I hope I've understood the code/design well enough to have this note be correct, but if not, others should please correct it! -Marshall On 4/23/2013 5:13 PM, Richard Eckart de Castilho wrote: > Hi there, > > it's possible to get all indexed FSes of a type via > > cas.getIndexRepository().getAllIndexedFS(type); > > however, this returns an FSIterator. For annotations, there > is > > cas.getAnnotationIndex(type) > > which returns an FSIndex. > > Is there some default index for each type quaranteed to be > always there with a well-known label which can be used with > > cas.getIndexRepository().getIndex(label); > > in analogy to getAnnotationIndex(type)? > > It'd be cool not to have to rely on the aggregate iterator > that is returned by getAllIndexedFS(). > > Cheers, > > -- Richard >
