Marshall Schor created UIMA-4111:
------------------------------------
Summary: Change how default bag indices are created
Key: UIMA-4111
URL: https://issues.apache.org/jira/browse/UIMA-4111
Project: UIMA
Issue Type: Improvement
Components: Core Java Framework
Reporter: Marshall Schor
Assignee: Marshall Schor
Fix For: 2.7.0SDK
UIMA-173 added the concept of a universal default bag index for types that
would be created if no other index was defined for that type. That Jira has a
link to the motivation, where it is clear that this was intended to simplify
how UIMA works and allow all feature structures that were addedToIndexes() to
be retrieved.
UIMA-297 corrected some anomalies in the original implementation.
This Jira is to correct the edge cases that happen when there are only Set
indices defined for a type. Because of the behavior of Set indices which
do not add to their index the 2nd or subsequent FSs whose key values match the
comparator definition for the Set, the original motivation of the default bag
index is thwarted in this case. This has caused several edge case issues; a
special note about this surprising behavior had to be included in the UIMA
documentation, etc.
More recently, another edge case has been discovered, when an annotator
contained in an aggregate having sufficient index definitions to insure a
non-set index for type T is remoted, and that remote service has only a Set
index for type T. Assume that the client has added-to-indices 100 instances of
type T, the CAS is serialized to the remote, the remote deserializes the CAS
and does 100 add-to-indices, of which perhaps 50 succeed, and the other 50 are
no-ops (due to the Set equivalance). Now when the remote CAS is returned, only
50 will appear in the index back at the client. This goes against the
principle in UIMA where we try and have remoting of components not affect the
semantics, where possible. This is also quite a surprising effect, which won't
be expected by most users. This is also an "unstable" effect, in that, if a
pipeline "assembler" (knowing little about the "internals" of the components)
were to add a component to the remote which included a non-set index for type
T, it would start behaving differently, not losing any indexed items.
The converse would also be true: If the remote had no indices defined for type
T, then add-to-indices for type T would be recorded in lazily created default
bag indices, and those events would be sent back to the client. If an assembler
were to now add a component which contained only a set definition for type T,
this behavior would suddenly start dropping FSs that were excluded due to the
Set comparator.
For all these reasons (discovered in discussions with Edward Epstein and Adam
Lally), and because of the original intent of this default bag index
(discovered by reading the mail archives pointed to by the above two Jiras
which describe in some detail the motivations for this), this Jira changes the
logic of when the default bag index is created to create it whenever the
situation is that some add-to-indices event would not record an addition (e.g.,
if there were no indices, or only Set indices, and the FS matched elements
already in the Sets.).
This change will affect documentation, so update that too. In particular, the
NOTE in this section
http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aae.reading_results_previous_annotators
will no longer apply.
The behavior of getAllIndexedFS(type) will change - it will no longer have an
exception for the special case where only Set indices were defined for the type.
Because it seems that it is extremely unlikely that the previous behavior was
being depended upon, there is no global flag to restore the previous behavior.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)