[ 
https://issues.apache.org/jira/browse/OPENNLP-760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Agerri updated OPENNLP-760:
-----------------------------------
    Description: 
Current SimpleLemmatizer is dictionary-based. A probabilistic lemmatizer works 
better for unknown words. There is already an open source tool which we could 
be based on to implement this into OpenNLP.

https://code.google.com/p/mate-tools

This the algorithm. The first paper describes the general idea and the
second presents the experiments in a realistic environment.

http://grzegorz.chrupala.me/papers/chrupala-2006/paper.pdf

http://grzegorz.chrupala.me/papers/chrupala-etal-2008a/paper.pdf

> probabilistic lemmatizer
> ------------------------
>
>                 Key: OPENNLP-760
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-760
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Lemmatizer
>            Reporter: Rodrigo Agerri
>            Priority: Minor
>
> Current SimpleLemmatizer is dictionary-based. A probabilistic lemmatizer 
> works better for unknown words. There is already an open source tool which we 
> could be based on to implement this into OpenNLP.
> https://code.google.com/p/mate-tools
> This the algorithm. The first paper describes the general idea and the
> second presents the experiments in a realistic environment.
> http://grzegorz.chrupala.me/papers/chrupala-2006/paper.pdf
> http://grzegorz.chrupala.me/papers/chrupala-etal-2008a/paper.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to