Thanks Marshall, that was exhaustive. Sounds like it would be good to drop everything that's using getAllIndexedFS() from uimaFIT for the time being.
At least for the annotations, there's a well defined API with the built-in default indexes. Thanks again! -- Richard Am 24.04.2013 um 17:10 schrieb Marshall Schor <[email protected]>: > This area seems to be somewhat complex, with some design decision critera > which > I don't know (perhaps others know). > > UIMA supports several kinds of indexes: Bag (duplicates allowed, key == the > featureStructure itself), Set (no dups, defined keys used to establish > "equality" for set exclusion), and sorted (may have duplicates, and and there > are two sub-kinds: those sorted by keys, and type-priority (no keys). > > Internally, each index is over one type. When actually iterating over a type, > the iteration includes subtypes of that type, so iterators actually work by > using multiple (single-type) indexes at once, and give the effect of iterating > over a type and all its subtypes. > > Defining an index for type T automatically defines it for all subtypes of T > as well. > > The add-to-index support was modified back in version 2.1 to automatically > declare a bag index, lazily, if a type is being added to an index and no index > exists (via the merged index definitions) for that type. > > Now, the first bit of ambiguity for "getAllIndexedFSs" arises in this use > case: > a user declares a set index for type T. Because there's a set index, the > auto-creation of a bag index for type T is not done. The set index could > define > a key which results in all feature structures of type T being mapped to the > same > key, with the result that only the first one will be in the index. So, even > though multiple feature structures of type T were created, UIMA only has the > first one indexed (but presumably, this was the intent), and only 1 will be > returned. > > The 2nd bit of ambiguity: if there is a sorted or bag index defined (or if no > index is defined - meaning that there will be an automatically generated bag > index defined lazily), then if the same identical feature structure is added > to > the indexes multiple times, it will be in the index multiple times (unless it > is > removed from the index). If a feature structure is added 5 times and removed > 2 > times, then it's still in the bag / sorted index 3 times. > > Now, to answer your specific question: > >>> Is there some default index for each type quaranteed to be always there with > a well-known label which can be used with > cas.getIndexRepository().getIndex(label); in analogy to > getAnnotationIndex(type)? > > Not currently, because such an index doesn't actually necessarily exist. The > AnnotationIndex is one, specific index that seemed to be of interest > sufficiently often that it was "built-in". > > Here are some issues with dynamically constructing such an index (on first > request, for instance). > > It would be dependent on the set of existing indexes at the time. Since > (currently) the default-bag-indexes are constructed lazily during the > add-to-indexes operation, if no type T was added-to-indexes (and no index over > type T or any of its supertypes was defined), there would be no default index > of > type T so that would not be in the constructed index. If later, someone did > an > add-to-indexes operation on type T, this would change, and now the dynamically > constructed index would be wrong, and need to be redone. This could be avoided > by forcing the creation (early) of missing default indexes, of course, but > that > could be quite inefficient. > > Next, this would be a strange new kind of index, with different types of > indexes > operating at different levels of the type hierarchy. For instance, the type > asked for may be a default-bag-index, whereas a subtype might be a set index. > I'm guessing the UIMA code may not support this kind of index without some > rework. > > Furthermore, the FSIterator returned by the getAllIndexedFS(type) method may > behave in somewhat unexpected ways, depending on what indexes have been > defined > for the type and all of its subtypes. If only "set" indexes have been defined > for some subtype, ST, of <type>, then only the FS of that ST which were > "unique" > in the set definition will be found; if *no* indexes were defined for <type>, > then all FSs (with possibly duplicates, if the same FS was added to indexes > multiple times) would be returned. > > I hope I've understood the code/design well enough to have this note be > correct, > but if not, others should please correct it! > > -Marshall > > On 4/23/2013 5:13 PM, Richard Eckart de Castilho wrote: >> Hi there, >> >> it's possible to get all indexed FSes of a type via >> >> cas.getIndexRepository().getAllIndexedFS(type); >> >> however, this returns an FSIterator. For annotations, there >> is >> >> cas.getAnnotationIndex(type) >> >> which returns an FSIndex. >> >> Is there some default index for each type quaranteed to be >> always there with a well-known label which can be used with >> >> cas.getIndexRepository().getIndex(label); >> >> in analogy to getAnnotationIndex(type)? >> >> It'd be cool not to have to rely on the aggregate iterator >> that is returned by getAllIndexedFS(). >> >> Cheers, >> >> -- Richard >> >
