Hi,

These are some not-quite-thought-out thoughts on "type" use in UIMA iterators. 

When I first encountered the detailed design of these in UIMA, I was surprised
to find that, except for type priority ordering, types did not play a major role
in the UIMA iterator APIs. 

    In particular, a FS used as an argument in moveTo(fs) could be a supertype
    of the type the index was over, as long as the supertype had the key
    fields.  This is, for example, typically the case for, say, an
    AnnotationIndex over some type like "Token"; you can use an "Annotation" (a
    supertype of Token) as the argument in moveTo(fs).

The AnnotationIndex defines a typePriority key.  To explore this further, let's
think about cases where the index doesn't use a typePriority key.

Assume we define a type Foo, and some subtypes of Foo:

Foo
  -FooSub_a
     -- FooSub_a_a  (subtype of FooSub_a)
     -- FooSub_a_b
  -FooSub_b

Next, assume you define/create an **index over FooSub_a**, with no typePriority 
key.

Now you could get an iterator over that index, and do operations like
"moveTo(xxx)";  the type of xxx could be any type defining the sorting key(s)
for the index.  In particular, it could be a subtype, or a supertype.  The type,
itself, plays no role in the moveTo operation.

===========
This was a surprise to me, when I first learned of it. 

I guess I had implicitly assumed that if I said
  -moveTo(aFooSub_b),
    --where there was a type FooSub_b which was "equal" (using the index's
compare operation)
a subsequent "get" would get a FooSub_b instance. 

Instead, I get the "left-most" FS in the index which compares "equal" with xxx,
which could be a FooSub_a instance
  - which is neither a sub or supertype of xxx

===========
If the index is defined **with a typePriority key**, then in the above case, I
do get a FS of the type of xxx (assuming it exists, of course).
===========

This is how UIMA V2 works.  It's mostly a "don't care" thing, I believe, because
of the prevalent use of the AnnotationIndex, which does define a typePriority 
key.

For UIMA v3, we could modify this behavior. 

One proposal is to change the meaning of "move-to-leftmost" in just the case
illustrated, where there is an "equal" match with the xxx; the modification
would be to (temporarily) include the type in move-to-leftmost, so the move
stops when the type becomes unequal.  This guarantees that the next "get" gets
the same type as the key, if the key exists.

    This proposal is for type equal matching, not for type/subtype matching.  So
    if the moveTo(xxx) was for type FooSub_b, but there was no matching instance
    of that type, but there were matching instances of other types (sub types,
    super types, and other (e.g. FooSub_a) types), the iterator would move to
    the leftmost one of all of these.  (Of course, with more complexity, other
    designs could be done).

    Issue: imagine there were multiple FSs "equal" to xxx, of FooSub_b, and
    other types.  Nothing is said about what moveToNext would do.  It could well
    move to a FS of some other type, instead of first going among the FooSub_b
    types.
      - the proposal could be augmented to guarantee all FSs "equal" to xxx of
    FooSub_b, would be returned first, if iterating forwards.

    Although this seems like the "least surprise" result, it starts to produce
    implementation complexity, and perhaps other surprises for other cases.

    So I'm not sure if any of these modifications are the right thing to do... 
    as compared to the simpler (more consistent, less special case, but with
    other surprises) approach that V2 has.

Just a note:

    Left-most is a concept applying only to FSs in the index which compare
    "equal" (using the keys specified for the index), and means the left-most
    one among the set of equal items.

Do others feel some sort of "improvement" in the moveTo(xxx) definition along
any of these lines is needed?  Or is it best to just keep things like v2 does
it, with the same "surprises"?

-Marshall

Reply via email to