+1 to Tommaso's comment. This would be very nice to have in the project. R
On Wed, Jul 5, 2017 at 9:19 AM, Tommaso Teofili <tommaso.teof...@gmail.com> wrote: > thanks Thamme for bringing this to the list! > > > Il giorno mer 5 lug 2017 alle ore 03:49 Thamme Gowda <tgow...@gmail.com> ha > scritto: > >> Hello OpenNLP Devs, >> >> I am working with text classification using word embeddings like >> Gloves/Word2Vec and LSTM networks. >> It will be interesting to see if we can use it as document categorizer, >> especially for sentiment analysis in OpenNLP. >> >> I have already raised a PR to the sandbox repo - >> https://github.com/apache/opennlp-sandbox/pull/3 >> >> This is first version, and I expect to receive feedback from Dev community >> to make it work for everyone. >> >> Here are the design choices I have made for the initial version: >> >> - Using pre-trained Gloves - I felt the glove vector format is clean, >> easily customizable in terms of dimensions and vocabulary size, and >> (also I >> have been reading a lot about them from Stanford NLP group). >> - Training Gloves isnt hard either, we can do it using the original C >> library as well as by using DL4J. >> - Using DL4J's Multi layer networks with LSTM instead of reinventing >> this stuff again on JVM for OpenNLP >> >> >> Please share your feedback here or on the github page >> https://github.com/apache/opennlp-sandbox/pull/3 . >> >> > I think the approach outlined here sounds good, I think we could > incorporate the PR as soon as it implements the Doccat API. > Then we may see whether and how it makes sense to adjust it to use other > types of embeddings (e.g. paragraph vectors) and / or different network > setups (e.g. more hidden layers, bidirectionalLSTM, etc.). > > Looking forward to see this move forward, > Regards, > Tommaso > > >> >> Thanks, >> TG >> >> >> -- >> *Thamme Gowda * >> @thammegowda <https://twitter.com/thammegowda> | >> http://scf.usc.edu/~tnarayan/ >> ~Sent via somebody's Webmail server >>