On Thu, Dec 29, 2022 at 7:38 AM Richard Eckart de Castilho <r...@apache.org>
wrote:

>
> > On 29. Dec 2022, at 13:01, Pablo Duboue <pablo.dub...@gmail.com> wrote:
> >
> > Here is some dream concept code:
> > https://gist.github.com/DrDub/9413410626b5a77d8f1f576f6447d64e  (getting
> > the syntax and approach right will take a lot of iterations and
> > consultations of course)
>
> Thanks for the example code :) It has some interesting ideas. I'll
> consider them based on my background with Cassis and UIMA-J.
>
>
> == Type system
>
> I can see that you imagine defining types in a natural pythonic way here.
> For Cassis, we chose a different approach that is based on a type system
> definition (either programmatically [1] created or loaded from XML [2]) and
> then uses factory methods to generate type classes (comparable to JCas
> classes).
>
> We needed the type classes to have special properties and we wanted to be
> able to handle UIMA features like type system merging - so we couldn't go
> with simple Python classes.
>

I tried to make it similar to ORM frameworks in Python that address a
similar concern. Python is a very dynamic language, it should be possible
to do all the type system merging, etc over Python classes.


> == Access to CAS contents
>
> Your python code seems inspired by the UIMAv2 CAS index API.
>

Well, that's what UIMA CPP supports.


> UIMAv3 introduces a new "select" API for retrieving FSes from the CAS [3].
> This was inspired by the popular "select" methods of uimaFIT. In cassis, a
> simple version of select has been implemented [4] which feels more like the
> uimaFIT methods than like the V3 select API.
>

Yes, I'm familiar with umaFIT select.


> Note that Cassis does not support indices or type priorities. To be
> honest, those always seemed to be more in the way than helpful anyway. The
> UIMAv3 select API by also default ignores type priorities (can be turned on
> though for a given select call).
>

Type priorities were indeed a rare bird. But type indices are mighty
useful. So UIMAv3 has no indices at all? Getting an iterator over
annotations that fall inside another annotation is a very common task
(sentences within paragraphs, tokens within sentences, etc). It is one of
the few constructs that other NLP frameworks provide.


> == Component concept
>
> The Python annotation with component metadata on the analysis engine class
> looks interesting. I wonder if you need the indexes though. Can you not
> work simply with the built-in annotation index?
>

Wouldn't that be slow? Iterate over thousands of annotations for only a few
paragraph annotations? At any rate UIMA CPP has the indices so it'll go
very fast.


> == Data mapping
>
> The `wrap` code in there looks very interesting, e.g.
>
> -----
>         SetFeature({MyNER.Source: "spaCy"}).wrap(
>             TypeMapper(output={spacy.Sentence: MySentence, spacy.NER:
> MyNER}).wrap(
>                 SpacyAnnotator({"load": "en"})
>             )
>         )
> -----
>

Thanks :-)

The need of type mapping code arised at a customer site [1] and I always
found it a missing piece in the framework.

P

[1]
https://www.javatips.net/api/type-mapper-for-uima-master/src/main/java/com/radialpoint/uima/typemapper/TypeMapper.java

Reply via email to