Wrt. to Eddie's question roles. For the sake of discussion let's split UIMA into two parts:
a) the CAS data structure and related serialization formats b) the rest, in particular annotators and means of running them The UIMA Java SDK supports both a and b. DKPro Cassis (Python) supports only a, but has no concept of annotators or pipelines and stuff. Cassis is not part of Apache UIMA, but I still list it here because it is currently probably the best/only UIMA-esque option for Python available. I am not sure what exactly UIMA-C++ supports. I believe the UIMA Java SDK can call out to UIMA-C++-based annotators and use them via JNI. E.g. does the UIMA C++ SDK allow building aggregate annotators? Currently, we do not have an option to call out to Python-based annotators from the UIMA Java SDK. In particular at the point all the deep-learning frameworks were pouring in, there was a question if/how to invoke these mostly Python-based frameworks from within UIMA pipelines. Meanwhile, there are Java bindings for Tensorflow, DeepLearning4J and other Java-friendly DL tools, so this gap has somewhat closed. However, few data scientists would at the present point would build a Java-based pipeline calling out to Python. Engineers may do it in particular when trying to integrate new methods into existing systems, but because Python is notoriously annoying to deploy (unless one Dockerizes stuff), they may prefer the native Java DL frameworks. Currently, we do also not have an option to build UIMA pipelines in Python. This might be interesting for data scientists to some degree, in particular if they like the offset-based annotation approach of UIMA. They could use the CAS implementation of DKPro Cassis and implement their own annotator/pipeline conventions from there. Would UIMA-CPP help Pythonistas to build pipelines in Python? I suppose, UIMA-CPP brings its own CAS implementation which may be faster (?) or more memory efficient (?) than the pure-Python implementation provided by Cassis. So if that is correct, a python-friendly uimacpp may be something like a numpy library where we have a Python API to a fast C++ implementation underneath? Would having UIMA-CPP with Python bindings allow to implement e.g. some Python-Huggingface annotator and then call it from the Java SDK? Vice versa, would it be possible to build a Python/UIMA-C++ aggregate annotator that calls a Java-based UIMA component? Any thoughts? Cheers, -- Richard