+1 to merge this when it implements the Document Categorizer, then we
can also use those tools to train and evaluate it

Jörn

On Wed, Jul 5, 2017 at 9:28 AM, Rodrigo Agerri <[email protected]> wrote:
> Hello again,
>
> @Thamme, out of curiosity, do you have evaluation numbers on the
> Stanford Large Movie Review dataset?
>
> Best,
>
> Rodrigo
>
> On Wed, Jul 5, 2017 at 9:25 AM, Rodrigo Agerri <[email protected]> wrote:
>> +1 to Tommaso's comment. This would be very nice to have in the project.
>>
>> R
>>
>> On Wed, Jul 5, 2017 at 9:19 AM, Tommaso Teofili
>> <[email protected]> wrote:
>>> thanks Thamme for bringing this to the list!
>>>
>>>
>>> Il giorno mer 5 lug 2017 alle ore 03:49 Thamme Gowda <[email protected]> ha
>>> scritto:
>>>
>>>> Hello OpenNLP Devs,
>>>>
>>>> I am working with text classification using word embeddings like
>>>> Gloves/Word2Vec and LSTM networks.
>>>> It will be interesting to see if we can use it as document categorizer,
>>>> especially for sentiment analysis in OpenNLP.
>>>>
>>>> I have already raised a PR to the sandbox repo -
>>>> https://github.com/apache/opennlp-sandbox/pull/3
>>>>
>>>> This is first version, and I expect to receive feedback from Dev community
>>>> to make it work for everyone.
>>>>
>>>> Here are the design choices I have made for the initial version:
>>>>
>>>>    - Using pre-trained Gloves - I felt the glove vector format is clean,
>>>>    easily customizable in terms of dimensions and vocabulary size, and
>>>> (also I
>>>>    have been reading a lot about them from Stanford NLP group).
>>>>       - Training Gloves isnt hard either, we can do it using the original C
>>>>       library as well as by using DL4J.
>>>>       - Using DL4J's Multi layer networks with LSTM instead of reinventing
>>>>    this stuff again on JVM for OpenNLP
>>>>
>>>>
>>>> Please share your feedback here or on the github page
>>>> https://github.com/apache/opennlp-sandbox/pull/3 .
>>>>
>>>>
>>> I think the approach outlined here sounds good, I think we could
>>> incorporate the PR as soon as it implements the Doccat API.
>>> Then we may see whether and how it makes sense to adjust it to use other
>>> types of embeddings (e.g. paragraph vectors) and / or different network
>>> setups (e.g. more hidden layers, bidirectionalLSTM, etc.).
>>>
>>> Looking forward to see this move forward,
>>> Regards,
>>> Tommaso
>>>
>>>
>>>>
>>>> Thanks,
>>>> TG
>>>>
>>>>
>>>> --
>>>> *Thamme Gowda *
>>>> @thammegowda <https://twitter.com/thammegowda> |
>>>> http://scf.usc.edu/~tnarayan/
>>>> ~Sent via somebody's Webmail server
>>>>

Reply via email to