Thanks Marshall, that was exhaustive.

Sounds like it would be good to drop everything that's using getAllIndexedFS()
from uimaFIT for the time being.

At least for the annotations, there's a well defined API with the built-in
default indexes.

Thanks again!

-- Richard

Am 24.04.2013 um 17:10 schrieb Marshall Schor <[email protected]>:

> This area seems to be somewhat complex, with some design decision critera 
> which
> I don't know (perhaps others know).
> 
> UIMA supports several kinds of indexes: Bag (duplicates allowed, key == the
> featureStructure itself), Set (no dups, defined keys used to establish
> "equality" for set exclusion), and sorted (may have duplicates, and and there
> are two sub-kinds: those sorted by keys, and type-priority (no keys).
> 
> Internally, each index is over one type.  When actually iterating over a type,
> the iteration includes subtypes of that type, so iterators actually work by
> using multiple (single-type) indexes at once, and give the effect of iterating
> over a type and all its subtypes.
> 
> Defining an index for type T automatically defines it for all subtypes of T 
> as well.
> 
> The add-to-index support was modified back in version 2.1 to automatically
> declare a bag index, lazily, if a type is being added to an index and no index
> exists (via the merged index definitions) for that type. 
> 
> Now, the first bit of ambiguity for "getAllIndexedFSs" arises in this use 
> case:
> a user declares a set index for type T.  Because there's a set index, the
> auto-creation of a bag index for type T is not done.  The set index could 
> define
> a key which results in all feature structures of type T being mapped to the 
> same
> key, with the result that only the first one will be in the index.  So, even
> though multiple feature structures of type T were created, UIMA only has the
> first one indexed (but presumably, this was the intent), and only 1 will be
> returned.
> 
> The 2nd bit of ambiguity: if there is a sorted or bag index defined (or if no
> index is defined - meaning that there will be an automatically generated bag
> index defined lazily), then if the same identical feature structure is added 
> to
> the indexes multiple times, it will be in the index multiple times (unless it 
> is
> removed from the index).  If a feature structure is added 5 times and removed 
> 2
> times, then it's still in the bag / sorted index 3 times.
> 
> Now, to answer your specific question:
> 
>>> Is there some default index for each type quaranteed to be always there with
> a well-known label which can be used with
> cas.getIndexRepository().getIndex(label); in analogy to 
> getAnnotationIndex(type)?
> 
> Not currently, because such an index doesn't actually necessarily exist.  The
> AnnotationIndex is one, specific index that seemed to be of interest
> sufficiently often that it was "built-in". 
> 
> Here are some issues with dynamically constructing such an index (on first
> request, for instance).
> 
> It would be dependent on the set of existing indexes at the time.  Since
> (currently) the default-bag-indexes are constructed lazily during the
> add-to-indexes operation, if no type T was added-to-indexes (and no index over
> type T or any of its supertypes was defined), there would be no default index 
> of
> type T so that would not be in the constructed index.  If later, someone did 
> an
> add-to-indexes operation on type T, this would change, and now the dynamically
> constructed index would be wrong, and need to be redone. This could be avoided
> by forcing the creation (early) of missing default indexes, of course, but 
> that
> could be quite inefficient.
> 
> Next, this would be a strange new kind of index, with different types of 
> indexes
> operating at different levels of the type hierarchy.  For instance, the type
> asked for may be a default-bag-index, whereas a subtype might be a set index. 
> I'm guessing the UIMA code may not support this kind of index without some 
> rework.
> 
> Furthermore, the FSIterator returned by the getAllIndexedFS(type) method may
> behave in somewhat unexpected ways, depending on what indexes have been 
> defined
> for the type and all of its subtypes.  If only "set" indexes have been defined
> for some subtype, ST, of <type>, then only the FS of that ST which were 
> "unique"
> in the set definition will be found; if *no* indexes were defined for <type>,
> then all FSs (with possibly duplicates, if the same FS was added to indexes
> multiple times) would be returned.
> 
> I hope I've understood the code/design well enough to have this note be 
> correct,
> but if not, others should please correct it!
> 
> -Marshall
> 
> On 4/23/2013 5:13 PM, Richard Eckart de Castilho wrote:
>> Hi there,
>> 
>> it's possible to get all indexed FSes of a type via
>> 
>>  cas.getIndexRepository().getAllIndexedFS(type);
>> 
>> however, this returns an FSIterator. For annotations, there
>> is 
>> 
>>  cas.getAnnotationIndex(type)
>> 
>> which returns an FSIndex.
>> 
>> Is there some default index for each type quaranteed to be
>> always there with a well-known label which can be used with
>> 
>>  cas.getIndexRepository().getIndex(label);
>> 
>> in analogy to getAnnotationIndex(type)?
>> 
>> It'd be cool not to have to rely on the aggregate iterator
>> that is returned by getAllIndexedFS().
>> 
>> Cheers,
>> 
>> -- Richard
>> 
> 

Reply via email to