After some more thinking, I'm now leaning toward making no change at all from how V2 does things, because the unexpected consequences seem complex to describe.
The way this will affect things is: 1) specifying a typePriority key for a search index, and having no type priority defined, or using "select" APIs with no typePriority, will use equal type compare when establishing a move-to-leftmost boundary (currently, in v3, if no typePriority is defined, this is not done, so the move-to-leftmost might move farther to the left) 2) if no type priority key is defined for a search index, move-to-left-most will ignore the type. (no change from how it works now in v2). 3) The comment in UIMA-5536 (to reduce the "surprise" in one case) will not be implemented. -Marshall On 8/18/2017 9:27 AM, Marshall Schor wrote: > Hi, > > These are some not-quite-thought-out thoughts on "type" use in UIMA > iterators. > > When I first encountered the detailed design of these in UIMA, I was surprised > to find that, except for type priority ordering, types did not play a major > role > in the UIMA iterator APIs. > > In particular, a FS used as an argument in moveTo(fs) could be a supertype > of the type the index was over, as long as the supertype had the key > fields. This is, for example, typically the case for, say, an > AnnotationIndex over some type like "Token"; you can use an "Annotation" > (a > supertype of Token) as the argument in moveTo(fs). > > The AnnotationIndex defines a typePriority key. To explore this further, > let's > think about cases where the index doesn't use a typePriority key. > > Assume we define a type Foo, and some subtypes of Foo: > > Foo > -FooSub_a > -- FooSub_a_a (subtype of FooSub_a) > -- FooSub_a_b > -FooSub_b > > Next, assume you define/create an **index over FooSub_a**, with no > typePriority key. > > Now you could get an iterator over that index, and do operations like > "moveTo(xxx)"; the type of xxx could be any type defining the sorting key(s) > for the index. In particular, it could be a subtype, or a supertype. The > type, > itself, plays no role in the moveTo operation. > > =========== > This was a surprise to me, when I first learned of it. > > I guess I had implicitly assumed that if I said > -moveTo(aFooSub_b), > --where there was a type FooSub_b which was "equal" (using the index's > compare operation) > a subsequent "get" would get a FooSub_b instance. > > Instead, I get the "left-most" FS in the index which compares "equal" with > xxx, > which could be a FooSub_a instance > - which is neither a sub or supertype of xxx > > =========== > If the index is defined **with a typePriority key**, then in the above case, I > do get a FS of the type of xxx (assuming it exists, of course). > =========== > > This is how UIMA V2 works. It's mostly a "don't care" thing, I believe, > because > of the prevalent use of the AnnotationIndex, which does define a typePriority > key. > > For UIMA v3, we could modify this behavior. > > One proposal is to change the meaning of "move-to-leftmost" in just the case > illustrated, where there is an "equal" match with the xxx; the modification > would be to (temporarily) include the type in move-to-leftmost, so the move > stops when the type becomes unequal. This guarantees that the next "get" gets > the same type as the key, if the key exists. > > This proposal is for type equal matching, not for type/subtype matching. > So > if the moveTo(xxx) was for type FooSub_b, but there was no matching > instance > of that type, but there were matching instances of other types (sub types, > super types, and other (e.g. FooSub_a) types), the iterator would move to > the leftmost one of all of these. (Of course, with more complexity, other > designs could be done). > > Issue: imagine there were multiple FSs "equal" to xxx, of FooSub_b, and > other types. Nothing is said about what moveToNext would do. It could > well > move to a FS of some other type, instead of first going among the FooSub_b > types. > - the proposal could be augmented to guarantee all FSs "equal" to xxx of > FooSub_b, would be returned first, if iterating forwards. > > Although this seems like the "least surprise" result, it starts to produce > implementation complexity, and perhaps other surprises for other cases. > > So I'm not sure if any of these modifications are the right thing to > do... > as compared to the simpler (more consistent, less special case, but with > other surprises) approach that V2 has. > > Just a note: > > Left-most is a concept applying only to FSs in the index which compare > "equal" (using the keys specified for the index), and means the left-most > one among the set of equal items. > > Do others feel some sort of "improvement" in the moveTo(xxx) definition along > any of these lines is needed? Or is it best to just keep things like v2 does > it, with the same "surprises"? > > -Marshall >
