On 7/20/11 8:57 PM, Marshall Schor wrote:
>
>
> Now both AEs are made by different vendors, and both decide to declare their
> own token type. Then this type system merging doesn't work.
>
> As far as I know the only common used work around for this issue is, not to
use
> JCas and to define type system mappings, where the types the AE needs are
mapped
> based on some configuration.
I don't understand why JCas cannot be used -- that seems to me to be independent
of the need for having type system mappings. I'm thinking that one annotator
produces a.b.Token, and a down-stream annotator needs c.d.Token with some
different kinds of meanings assigned to features - in this case you introduce a
custom mapping annotator, that iterates over the a.b.Token(s), and makes the
corresponding c.d.Token feature structures. JCas can be used for both of these,
as desired.
Ok, that is possible, but this way you start writing code, for something
the framework
could do. And maintaining all kind of type system mapping AEs isn't
really fun either.
>
>
> I think a solution to this problem is, to stop doing this type system
merging,
> and always
> map one common type system to every Annotators private type system.
The hard part is getting a community to agree to "one common type system", I
think. But we have seen in large projects, that this often can be done, within
one project.
Other times, groups working collaboratively, have gotten together and defined a
common type system for their work.
My point is that a user defines his own type system, and a mapping which
translates parts
of this type system to the annotator type system.
So in the sample above a user defines this type system:
Type: com.foo.Token
Feature: double tokenConfidence
Feature: String posTag
Feature: double posConfidence
The tokenizer also defined its type system:
Type: opennlp.Token
Feature: float confidence
And one more type system for the pos tagger:
Type: opennlp.POSToken
Feature: float confidence
Feature: String tag
The user defined AAE only knows the user type system and needs to
define "rules" which tell it how to transform opennlp.Token annotations
to com.foo.Token annotations, and then it needs a rule to transform
a com.foo.Token into an opennlp.POSToken, and back.
Sure this is also already possible today, by writing these type mapping AEs,
as you would need to do for JCas. But I think having better framework
support
for this would make it easier.
Jörn