An undocumented quirk of UIMA Set indexes

Marshall Schor Tue, 18 Jul 2017 07:02:34 -0700

While thinking through some updates to UIMA v3 indexes/iterators, I tried the
following experiment.


Configure UIMA with:

- a set index, indexed over the "begin" feature only.

- a type system - built in + a new subtype of Annotation, called "Token".

- make an instance of Annotation with begin=17.

- make an instance of Token with the same begin=17 value.

Add both to the indexes.  Because the set index defines the equality as the
begin feature, and the begin feature is the same, you might expect the set to
have just one entry.

But it has 2 (both of these).

It turns out the "set-ness" is done per type; the effect is as-if the equality
comparator for Sets includes the type.

-----------------------------

I don't propose to change this, unless there's a consensus of requests from our
users.  I would guess that our users have gotten used to this implementation. 
But I do propose to document this :-).

Other opinions?

-Marshall

An undocumented quirk of UIMA Set indexes

Reply via email to