[
https://issues.apache.org/jira/browse/OPENNLP-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rodrigo Agerri updated OPENNLP-582:
-----------------------------------
Attachment: lemmatizer-prelim.patch
Hi Jörn,
I attach a first patch for the lemmatizer functionality. Right now I have only
included the required classes for the API to work. As we talked some time ago
by email, I have included a DictionaryLemmatizer interface and 3
implementations of it to perform lemmatization:
1. JWNL based.
2. HashMap based (loads dictionary on RAM), uses en-lemmas.dict
3. Morfologik-based (binary method for large dictionaries), it uses
english.dict.
I have tested it in several tools to perform lemmatization for several
languages and it works as expected. The only requirement is to provide the
dictionaries in the required formats for a given language (except the JWNL one
because its API works only for English; to be honest, I included this one
because I had it already implemented, but I do not think it is that useful).
I know that many other things need to be done before the inclusion of this
package in the project for v1.6.0, but let me know first if the developers
agree with the current structure before I carry on.
If you agree, please point out what I need to do next (CLI issues, tests,
etc.).
Cheers,
Rodrigo
> Add lemmatizer functionality
> ----------------------------
>
> Key: OPENNLP-582
> URL: https://issues.apache.org/jira/browse/OPENNLP-582
> Project: OpenNLP
> Issue Type: New Feature
> Components: POS Tagger
> Affects Versions: 1.6.0
> Reporter: Rodrigo Agerri
> Attachments: lemmatizer-prelim.patch
>
>
> Will add new functionality to perform dictionary based lemmatization. It will
> look up a word form and pos tag in a dictionary and produce the corresponding
> lemma.
--
This message was sent by Atlassian JIRA
(v6.1#6144)