[
https://issues.apache.org/jira/browse/OPENNLP-760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153094#comment-15153094
]
Rodrigo Agerri commented on OPENNLP-760:
----------------------------------------
The statistical lemmatizer has now been added. The lemmatizer takes a word, a
postag and a lemma from a corpus and induces the lemma classes by calculating
the permutations required to transform the word form in the lemma. This is
performed on the reversed strings. The resulting permutations is the class that
the statistical lemmatizer learns. Once predicted, the lemma class is decoded
back into the lemma.
For better API management, the DictionaryLemmatizer API has been modified to
reflect the interface of other tools in OpenNLP.
Once this issue is closed it remains to:
- Add a cmdline component for the new learnable lemmatizer.
- Add unit tests.
- Update the lemmatizer section in the documentation.
> probabilistic lemmatizer
> ------------------------
>
> Key: OPENNLP-760
> URL: https://issues.apache.org/jira/browse/OPENNLP-760
> Project: OpenNLP
> Issue Type: New Feature
> Components: Lemmatizer
> Reporter: Rodrigo Agerri
> Assignee: Rodrigo Agerri
> Priority: Minor
>
> Current SimpleLemmatizer is dictionary-based. A probabilistic lemmatizer
> works better for unknown words and can be combined with dictionaries.
> The method we will implement here is based on:
> Grzegorz ChrupaĆa. 2008. Towards a Machine-Learning Architecture for Lexical
> Functional Grammar Parsing. PhD dissertation, Dublin City University.
> http://grzegorz.chrupala.me/papers/phd-single.pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)