Hi,

> On 29. Dec 2022, at 13:01, Pablo Duboue <pablo.dub...@gmail.com> wrote:
> 
> I was not aware of dkpro-cassis and it looks really nice. If the objective
> is to wrap annotators, it seems the way to go.

Cassis was built for two reasons:

1) being able to create/load/manipulate/write CAS data in Python
2) being able to use those capabilities to interface with Python-based ML code 
(e.g. BERT).

The development was done in the context of the INCEpTION annotation tool [1] 
and there is a basic framework for wrapping Python-based ML code as simple 
services that INCEpTION can invoke [2].

The goal of cassis is not to provide a full pipeline architecture, although 
somebody might use it as a component in building such an architecture.

Cassis tends to be slower when parsing or serializing XMI (or JSON CAS [3] 
data) than UIMA-J. That may be a factor for somebody to prefer a 
UIMA-CPP-Python-binding over using cassis. Also, Cassis does not support XML 
1.1 - only XML 1.0 (we did not find a XML-1.1-capable XML parser package for 
Python).

Btw. the (new) JSON CAS [3] format was created because working with JSON is 
much easier these days than working with XML. It might be preferable to some 
over the traditional CAS XMI format.

> Also, by using offsets, a client can send 20Mb of text for chapter
> segmentation and return back the chapter offsets without having to receive
> the 20Mb back. (Such behaviour is quite common with simple JSON APIs.) This
> is the type of thing I do for the products of my company, by the way [1].

The Python-ML-services implementation [2] is pretty basic and not optimized. It 
does currently not even support *not* sending back the document text to save 
space ;) Although, that should probably be quite easy to add. There is also no 
message queue involved, only basic HTTP requests. While the current 
implementation useful, there are many things that could be improved.

Also, outside of INCEpTION, there is currently no (Java) implementation of 
client code that can talk to these services. The protocol is documented though 
and not very complex [4].

Cheers,

-- Richard

[1] https://github.com/inception-project/inception#readme
[2] https://github.com/inception-project/inception-external-recommender#readme
[3] https://github.com/apache/uima-uimaj-io-jsoncas#readme
[4] 
https://inception-project.github.io/releases/26.3/docs/developer-guide.html#_external_recommender

Reply via email to