On 7/20/2011 2:35 PM, Jörn Kottmann wrote:
> On 7/20/11 7:56 PM, Marshall Schor wrote:
>> The "normal" way of having annotators together is something that UIMA 
>> supports,
>> as a pipeline.  Part of this is setting up the pipeline at initialization 
>> time
>> by taking all the type systems declared by the annotators in the pipeline, 
>> and
>> merging them into one common type system.
>>
>> A CAS is generated using this one common type system, and then sent through 
>> the
>> pipeline.
>
> Yes, this of course works, but it is often problematic, because the merged
> type system
> needs to be suitable for all components.
>
> Lets say we have a tokenizer and a pos tagger, the pos tagger needs the output
> of the tokenizer as input. Therefore in UIMA you would declare a token type,
> and both AEs must use exactly the same token type.

Sort of true :-)

The Tokenizer could define a type: com.foo.Token subtype of
uima.tcas.Annotation, with feature "stem", "begin" and "end".

Then the POS tagger, could augment the com.foo.Token type with some additional
features, such as "POS", etc.

But of course, you are right, that there would need to be some cooperation,
because, the type name itself would have to match (including the package name)
and the features must not conflict - that is, have different range types (POS
declared an integer in one, and a String in another).  But it is OK to add
features to existing ones.

So, if the POS Tagger was written after the Tokenizer, and knew what the type
was in the Tokenizer, they could intentionally "augment" it.
>
>
> Now both AEs are made by different vendors, and both decide to declare their
> own token type. Then this type system merging doesn't work.
>
> As far as I know the only common used work around for this issue is, not to 
> use
> JCas and to define type system mappings, where the types the AE needs are 
> mapped
> based on some configuration.

I don't understand why JCas cannot be used -- that seems to me to be independent
of the need for having type system mappings.  I'm thinking that one annotator
produces a.b.Token, and a down-stream annotator needs c.d.Token with some
different kinds of meanings assigned to features - in this case you introduce a
custom mapping annotator, that iterates over the a.b.Token(s), and makes the
corresponding c.d.Token feature structures.  JCas can be used for both of these,
as desired.
>
>
> I think a solution to this problem is, to stop doing this type system merging,
> and always
> map one common type system to every Annotators private type system. 

The hard part is getting a community to agree to "one common type system", I
think.   But we have seen in large projects, that this often can be done, within
one project.

Other times, groups working collaboratively, have gotten together and defined a
common type system for their work.

-Marshall

> This mapping
> could give the AEs more flexibility and might even be able to perform simple
> type transformations.
> That would also make using JCas attractive again.
>
> This issue is even amplified by the fact that our users like to define their
> own type system,
> and then they only work properly if the AE implementers do type system mapping
> or program
> against this type system. The later case only work if the user and implementer
> is the same
> person/organization.
>
>> -----------
>>
>> In the case where each annotator is "bundled" as a OSGi bundle, that bundle
>> contains its own private copy of all the UIMA classes, including all of the 
>> UIMA
>> SDK, and any type system, etc.  Any JCAS generated classes are also private 
>> to
>> that bundle.
>>
>> This might make sense for running one Annotator by itself.
>
> Exactly.
>>   But for running
>> multiple annotators together, as separate OSGi components, I don't see how it
>> would "work" if each annotator were its own bundle.  How would the type 
>> systems
>> be combined at initialization time?  How would you share the JCAS generated
>> classes?  (I'll admit that this is not *required*, but is sometimes useful.)
>>
>> Does one of the Clerezza scenarios involve running multiple annotators, each
>> having its own bundle?  If so, how does that work?   (I'm guessing that 
>> there is
>> some "driver" code that uses UIMA Application APIs to separately initialize 
>> each
>> annotator,  and then maybe does something like getting a type system from 
>> all of
>> them, and merging them, and then creating a CAS from that, etc.  This is just
>> duplicating what the UIMA framework is doing - if it were "in charge" of the
>> pipeline and its management.)
>>
>> Thanks for the clarifications.
>>
>
> These are all points which don't really work out
> in the end (with our current release).
>
> Jörn
>
>

Reply via email to