This area seems to be somewhat complex, with some design decision critera which
I don't know (perhaps others know).

UIMA supports several kinds of indexes: Bag (duplicates allowed, key == the
featureStructure itself), Set (no dups, defined keys used to establish
"equality" for set exclusion), and sorted (may have duplicates, and and there
are two sub-kinds: those sorted by keys, and type-priority (no keys).

Internally, each index is over one type.  When actually iterating over a type,
the iteration includes subtypes of that type, so iterators actually work by
using multiple (single-type) indexes at once, and give the effect of iterating
over a type and all its subtypes.

Defining an index for type T automatically defines it for all subtypes of T as 
well.

The add-to-index support was modified back in version 2.1 to automatically
declare a bag index, lazily, if a type is being added to an index and no index
exists (via the merged index definitions) for that type. 

Now, the first bit of ambiguity for "getAllIndexedFSs" arises in this use case:
a user declares a set index for type T.  Because there's a set index, the
auto-creation of a bag index for type T is not done.  The set index could define
a key which results in all feature structures of type T being mapped to the same
key, with the result that only the first one will be in the index.  So, even
though multiple feature structures of type T were created, UIMA only has the
first one indexed (but presumably, this was the intent), and only 1 will be
returned.

The 2nd bit of ambiguity: if there is a sorted or bag index defined (or if no
index is defined - meaning that there will be an automatically generated bag
index defined lazily), then if the same identical feature structure is added to
the indexes multiple times, it will be in the index multiple times (unless it is
removed from the index).  If a feature structure is added 5 times and removed 2
times, then it's still in the bag / sorted index 3 times.

Now, to answer your specific question:

>> Is there some default index for each type quaranteed to be always there with
a well-known label which can be used with
cas.getIndexRepository().getIndex(label); in analogy to 
getAnnotationIndex(type)?

Not currently, because such an index doesn't actually necessarily exist.  The
AnnotationIndex is one, specific index that seemed to be of interest
sufficiently often that it was "built-in". 

Here are some issues with dynamically constructing such an index (on first
request, for instance).

It would be dependent on the set of existing indexes at the time.  Since
(currently) the default-bag-indexes are constructed lazily during the
add-to-indexes operation, if no type T was added-to-indexes (and no index over
type T or any of its supertypes was defined), there would be no default index of
type T so that would not be in the constructed index.  If later, someone did an
add-to-indexes operation on type T, this would change, and now the dynamically
constructed index would be wrong, and need to be redone. This could be avoided
by forcing the creation (early) of missing default indexes, of course, but that
could be quite inefficient.

Next, this would be a strange new kind of index, with different types of indexes
operating at different levels of the type hierarchy.  For instance, the type
asked for may be a default-bag-index, whereas a subtype might be a set index. 
I'm guessing the UIMA code may not support this kind of index without some 
rework.

Furthermore, the FSIterator returned by the getAllIndexedFS(type) method may
behave in somewhat unexpected ways, depending on what indexes have been defined
for the type and all of its subtypes.  If only "set" indexes have been defined
for some subtype, ST, of <type>, then only the FS of that ST which were "unique"
in the set definition will be found; if *no* indexes were defined for <type>,
then all FSs (with possibly duplicates, if the same FS was added to indexes
multiple times) would be returned.

I hope I've understood the code/design well enough to have this note be correct,
but if not, others should please correct it!

-Marshall

On 4/23/2013 5:13 PM, Richard Eckart de Castilho wrote:
> Hi there,
>
> it's possible to get all indexed FSes of a type via
>
>   cas.getIndexRepository().getAllIndexedFS(type);
>
> however, this returns an FSIterator. For annotations, there
> is 
>
>   cas.getAnnotationIndex(type)
>
> which returns an FSIndex.
>
> Is there some default index for each type quaranteed to be
> always there with a well-known label which can be used with
>
>   cas.getIndexRepository().getIndex(label);
>
> in analogy to getAnnotationIndex(type)?
>
> It'd be cool not to have to rely on the aggregate iterator
> that is returned by getAllIndexedFS().
>
> Cheers,
>
> -- Richard
>

Reply via email to