Hello again, @Thamme, out of curiosity, do you have evaluation numbers on the Stanford Large Movie Review dataset?
Best, Rodrigo On Wed, Jul 5, 2017 at 9:25 AM, Rodrigo Agerri <[email protected]> wrote: > +1 to Tommaso's comment. This would be very nice to have in the project. > > R > > On Wed, Jul 5, 2017 at 9:19 AM, Tommaso Teofili > <[email protected]> wrote: >> thanks Thamme for bringing this to the list! >> >> >> Il giorno mer 5 lug 2017 alle ore 03:49 Thamme Gowda <[email protected]> ha >> scritto: >> >>> Hello OpenNLP Devs, >>> >>> I am working with text classification using word embeddings like >>> Gloves/Word2Vec and LSTM networks. >>> It will be interesting to see if we can use it as document categorizer, >>> especially for sentiment analysis in OpenNLP. >>> >>> I have already raised a PR to the sandbox repo - >>> https://github.com/apache/opennlp-sandbox/pull/3 >>> >>> This is first version, and I expect to receive feedback from Dev community >>> to make it work for everyone. >>> >>> Here are the design choices I have made for the initial version: >>> >>> - Using pre-trained Gloves - I felt the glove vector format is clean, >>> easily customizable in terms of dimensions and vocabulary size, and >>> (also I >>> have been reading a lot about them from Stanford NLP group). >>> - Training Gloves isnt hard either, we can do it using the original C >>> library as well as by using DL4J. >>> - Using DL4J's Multi layer networks with LSTM instead of reinventing >>> this stuff again on JVM for OpenNLP >>> >>> >>> Please share your feedback here or on the github page >>> https://github.com/apache/opennlp-sandbox/pull/3 . >>> >>> >> I think the approach outlined here sounds good, I think we could >> incorporate the PR as soon as it implements the Doccat API. >> Then we may see whether and how it makes sense to adjust it to use other >> types of embeddings (e.g. paragraph vectors) and / or different network >> setups (e.g. more hidden layers, bidirectionalLSTM, etc.). >> >> Looking forward to see this move forward, >> Regards, >> Tommaso >> >> >>> >>> Thanks, >>> TG >>> >>> >>> -- >>> *Thamme Gowda * >>> @thammegowda <https://twitter.com/thammegowda> | >>> http://scf.usc.edu/~tnarayan/ >>> ~Sent via somebody's Webmail server >>>
