[ 
https://issues.apache.org/jira/browse/OPENNLP-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rodrigo Agerri updated OPENNLP-582:
-----------------------------------

    Attachment: lemmatizer-prelim.patch

Hi Jörn, 

I attach a first patch for the lemmatizer functionality. Right now I have only 
included the required classes for the API to work. As we talked some time ago 
by email, I have included a DictionaryLemmatizer interface  and 3 
implementations of it to perform lemmatization: 

1. JWNL based. 
2. HashMap based (loads dictionary on RAM), uses en-lemmas.dict
3. Morfologik-based (binary method for large dictionaries), it uses 
english.dict. 

I have tested it in several tools to perform lemmatization for several 
languages and it works as expected. The only requirement is to provide the 
dictionaries in the required formats for a given language (except the JWNL one 
because its API works only for English; to be honest, I included this one 
because I had it already implemented, but I do not think it is that useful). 

I know that many other things need to be done before the inclusion of this 
package in the project for v1.6.0, but let me know first if the developers 
agree with the current structure before I carry on. 

If you agree, please point out what I need to do next (CLI issues, tests, 
etc.).  

Cheers, 

Rodrigo 

> Add lemmatizer functionality
> ----------------------------
>
>                 Key: OPENNLP-582
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-582
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: POS Tagger
>    Affects Versions: 1.6.0
>            Reporter: Rodrigo Agerri
>         Attachments: lemmatizer-prelim.patch
>
>
> Will add new functionality to perform dictionary based lemmatization. It will 
> look up a word form and pos tag in a dictionary and produce the corresponding 
> lemma. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to