This sounds like it would be a valuable addition. I would be happy to help you with it where I can; others on the team also may be interested.
-Marshall Schor On 9/27/2017 1:27 PM, Benedict Holland wrote: > Hello all, > > My name is Ben Holland and I am a data scientist at Abt Associates. We are > working to develop a scalable NLP engine and selected UIMA with OpenNLP as > our tech. We wanted to run this over Spark as well. > > The huge draw for me was the awesome set of examples and documentation that > UIMA provided so that I could easily get up and running. With that in mind, > I am working with my company to put together code that I can give to the > UIMA team using only open source libraries (specifically UIMA, Hadoop, > Spark, and OpenNLP). I want to provide you with a fully functional example > developed in eclipse. > > I will need a contact within the UIMA team at Apache. If someone could > please get back to me on this, I would be most grateful. > > The goal of this process is to entirely mimic the CPE using the UIMA > xml descriptor files over a spark cluster. I do not rely on UIMAfit or any > 3rd party libraries apart from the JDBC driver. For bonus points, I hooked > this up to a database that reads text, populates N cas objects with > database values, processes the text, and saves particularly interesting > text to the database. I pull out names. > > Why am I coming to you? This is a very simple application. It really is a > proof of concept example but it is enough to get the architecture in place > to expand on it. > > I hope this interests you. I found it fascinating to work on this. > > BTW, you should all feel extremely proud of your work. I don't make these > offers often but the UIMA documentation, architecture, and code > readability/stability is incredible. Within a few months, we were able to > get a NLP engine into a process chain. I am very impressed. > > Thank you all so much, > ~Ben >
