Hi,
I don't think that providing a standard type system will enforce its usage. Ruta already provides some type systems and it does not hurt at all, e.g., normal uima users do not care about it. If there no standard type system, then people have two options: create their own one or reuse an existing type system of a component repository, e.g., DKPro Core. As far as I know LiMoSINe [1] moved from their own type system to DKPro Core (I waiting for some text to put on our external resources page - in case they read this). I also was thinking about switching our NLP components to the DKPro Core type system, but there are several issues preventing that, first of all that I cannot build it :-/ A standard type system will never fulfill all requirements of a special interest group, but it could be a start. Even if only a small part is shared, it could increase the interoperability. There are two main questions: - Can the community agree on what should it contain and how is it defined? Only basic stuff like Tokens, Sentences. What about POS Tags? Representation of coarse and fine-grained tags on feature- or type-level. Which variant of universal tagset, UD, google, ...? What about inter-linkage of annotations? - Will it be adapted by the community? Changing the type system is really a lot of work, especially if you have to support everything that you did before. I wonder if it can survive if DKPro Core does not adapt it. I could imagine that we (Averbis) are somewhat open to adapt parts of a standard type system, as I am planning to change our type system anyways. btw, in my experience, converting annotations between typesystems within a pipleine can easily become a performance bottleneck. Best, Peter [1] https://aclweb.org/anthology/P/P16/P16-4027.pdf Am 30.08.2016 um 15:59 schrieb Richard Eckart de Castilho: > While I think that an endorsed type system is a good idea, I still wonder... > > As far as I understood, UIMA has always been advertised as an "empty" > framework > that does explicitly not prescribe a particular type system - probably to > underline > it's flexibility. Would that not suffer if UIMA itself provided a standard > typesystem? > > Cheers, > > -- Richard > >> On 30.08.2016, at 15:56, Marshall Schor <[email protected]> wrote: >> >> This is a great idea. The key will be in discovering and using a workable >> "crowd-sourced" (?) process (and perhaps supporting tooling :-) ) that lets a >> diverse set of people with somewhat aligned interests converge on a shared >> definition. >> >> -Marshall >> >> On 8/30/2016 5:40 AM, Jens Grivolla wrote: >>> Hi all, >>> >>> at the LREC conference there were some brief discussions about pushing for >>> a "standard" typesystem (and maybe some more things) to make combining UIMA >>> annotators from different sources easier. >>> >>> While it is great that UIMA itself is a generic framework that is >>> completely agnostic to the tasks it is used for, there are many users that >>> want to be able to use existing analysis engines. Currently they are forced >>> to either choose a specific component collection (DKpro, cTakes, JCORE, >>> OpenNLP, ...) or write adapters to convert type systems. >>> >>> There was agreement between some of us (Richard, Peter, etc.) that it would >>> be very helpful to guide component developers towards a shared type system >>> to make adoption of UIMA easier and avoid fragmentation. >>> >>> Here are some suggestions on how to proceed: >>> >>> - go all in and have the UIMA project provide a type system (in the UIMA >>> namespace) >>> - develop an independent (unofficial) type system that is recommended on >>> the UIMA web site >>> - develop an unofficial type system and gather endorsements from a variety >>> of institutions (UPF, UKP, JulieLab, Averbis, ...) so as to promote this >>> type system. >>> >>> I think (and there was initial agreement on this) that DKpro's type system >>> would be a good starting point (with some fixes). >>> >>> So, how does everybody feel about this, and how do we get started? >>> >>> Best, >>> Jens
