Wrt. to Eddie's question roles.

For the sake of discussion let's split UIMA into two parts:

a) the CAS data structure and related serialization formats
b) the rest, in particular annotators and means of running them

The UIMA Java SDK supports both a and b.

DKPro Cassis (Python) supports only a, but has no concept of annotators or 
pipelines and stuff.
Cassis is not part of Apache UIMA, but I still list it here because it is 
currently probably the
best/only UIMA-esque option for Python available.

I am not sure what exactly UIMA-C++ supports.
I believe the UIMA Java SDK can call out to UIMA-C++-based annotators and use 
them via JNI.
E.g. does the UIMA C++ SDK allow building aggregate annotators?

Currently, we do not have an option to call out to Python-based annotators from 
the UIMA Java SDK.
In particular at the point all the deep-learning frameworks were pouring in, 
there was a question
if/how to invoke these mostly Python-based frameworks from within UIMA 
pipelines. Meanwhile, there
are Java bindings for Tensorflow, DeepLearning4J and other Java-friendly DL 
tools, so this gap has
somewhat closed. However, few data scientists would at the present point would 
build a Java-based
pipeline calling out to Python. Engineers may do it in particular when trying 
to integrate new
methods into existing systems, but because Python is notoriously annoying to 
deploy (unless one
Dockerizes stuff), they may prefer the native Java DL frameworks.

Currently, we do also not have an option to build UIMA pipelines in Python. 
This might be
interesting for data scientists to some degree, in particular if they like the 
offset-based
annotation approach of UIMA. They could use the CAS implementation of DKPro 
Cassis and implement
their own annotator/pipeline conventions from there. 

Would UIMA-CPP help Pythonistas to build pipelines in Python? 

I suppose, UIMA-CPP brings its own CAS implementation which may be faster (?) 
or more memory
efficient (?) than the pure-Python implementation provided by Cassis. So if 
that is correct,
a python-friendly uimacpp may be something like a numpy library where we have a 
Python API
to a fast C++ implementation underneath?

Would having UIMA-CPP with Python bindings allow to implement e.g. some 
Python-Huggingface
annotator and then call it from the Java SDK?

Vice versa, would it be possible to build a Python/UIMA-C++ aggregate annotator 
that calls a
Java-based UIMA component?

Any thoughts?

Cheers,

-- Richard

Reply via email to