On 11/19/2016 1:08 PM, Pablo Duboue wrote:
> Perhaps my main concern is the change with respect to the fact that
> now not all features are indexed. I can see that allows for
> significant memory savings and speed improvements. However, it
> introduces further dependencies between Annotator creators and users.
> A feature structure that in UIMA 2.0 would be indexed and available
> for users in ways the original author did not anticipate would still
> be indexed. If the result for this change is to obtain empty
> iterators, it will lead to bugs difficult to track.
We need to use more precise language here.  I think you mean to say that
"now not all Feature Structures (not features) are indexed". 
But in both version 2 and version 3, the choice to
"add FS to indexes" is up to the application/annotators, so not all Feature
Structures are indexed, even in version 2.

What is true, is that certain serializations that do not depend on tracing all
indexed + reachable FSs (abbreviation for Feature Structures), e.g. Binary blob
serialization, would serialize all Feature Structures in v2, and in v3 only
those that are indexed and reachable.  This could be a visible change to an
external application decoding UIMA's binary blob encoding.  UIMA, if used to
deserialize this blob format would re-constitute in v2 all the FSs, but the
un-indexed and un-reachable FSs would still not be accessible. 

Are you concerned only for the binary blob style of serialization changing in
this manner?  I'll note that most other forms of serialization already do
serialize only indexed + reachable FSs, even in V2.
>
> My question is: does the current metadata in the UIMAfit java
> annotations or XML descriptors captures exactly what feature
> structures are being indexed (I believe so, just want to double
> check).
Not really.  This because UIMA has the concept of a "default bag index", meaning
if you do not define any indexes over type "Foo", but nevertheless create a Foo
and add it to the indexes, it will be indexed, and serialized, etc., uniformly,
both for v2 and v3.  And you can get an iterator that will return them.
> This could be a good opportunity to include some sanity checking on
> aggregates: if no annotator upstream has declared as indexing a
> certain feature structure, then indicating it as input can produce a
> warning.
but no indexing declaration is needed due to the default bag index concept... 
:-)
>
> A more erroneous situation relates to accessing types not declared in
> the metadata. This is a common UIMAfit new user bug: forget to add any
> annotation to the class. In simpler configurations, it will work but
> in more complex ones (custom flow controllers, etc) will "fail"
> without failing: the annotations will just not be there. As I am
> concerned the selective indexing will exacerbate this situation,
> giving an actual error might be helpful.
>
I'm not sure what this means.  Can you give an example?
Here's are two made-up examples, both of which I think fail, so I'm missing
something I guess.

1) You are using the JCas (bad assumption?) and define a JCas class Foo for UIMA
type Foo, with feature foo_feature. 

2) In order to define this, you use the Component Descriptor Editor, and after
defining it you push the JCasGen button and it generates a JCas cover class
definition for this type.

3) You run your application with this class, but forget to put into the uimaFIT
metadata the Foo type.  When you try and do anything with the Foo type, you get
an error message something about a mismatch between the JCas Foo class and the
current UIMA Type system.  This actually is a hard failure...

Another example: not using JCas.
You have code that creates an instance of the UIMA type "Foo" but you've
forgotten to define this in the uimaFIT metadata.  The code fails - saying it
can't create a Foo...

-Marshall

Reply via email to