UIMA on top of Spark example

Benedict Holland Wed, 27 Sep 2017 10:27:28 -0700

Hello all,

My name is Ben Holland and I am a data scientist at Abt Associates. We are
working to develop a scalable NLP engine and selected UIMA with OpenNLP as
our tech. We wanted to run this over Spark as well.


The huge draw for me was the awesome set of examples and documentation that
UIMA provided so that I could easily get up and running. With that in mind,
I am working with my company to put together code that I can give to the
UIMA team using only open source libraries (specifically UIMA, Hadoop,
Spark, and OpenNLP). I want to provide you with a fully functional example
developed in eclipse.

I will need a contact within the UIMA team at Apache. If someone could
please get back to me on this, I would be most grateful.

The goal of this process is to entirely mimic the CPE using the UIMA
xml descriptor files over a spark cluster. I do not rely on UIMAfit or any
3rd party libraries apart from the JDBC driver. For bonus points, I hooked
this up to a database that reads text, populates N cas objects with
database values, processes the text, and saves particularly interesting
text to the database. I pull out names.

Why am I coming to you? This is a very simple application. It really is a
proof of concept example but it is enough to get the architecture in place
to expand on it.

I hope this interests you. I found it fascinating to work on this.

BTW, you should all feel extremely proud of your work. I don't make these
offers often but the UIMA documentation, architecture, and code
readability/stability is incredible. Within a few months, we were able to
get a NLP engine into a process chain. I am very impressed.

Thank you all so much,
~Ben

UIMA on top of Spark example

Reply via email to