[ 
https://issues.apache.org/jira/browse/OPENNLP-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Wiesner updated OPENNLP-660:
-----------------------------------
    Summary: Include list of stop words for various languages  (was: Stoplist)

> Include list of stop words for various languages
> ------------------------------------------------
>
>                 Key: OPENNLP-660
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-660
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Parser, Stemmer
>    Affects Versions: tools-1.5.3
>         Environment: all
>            Reporter: Martin Wunderlich
>            Priority: Minor
>              Labels: features, language, model
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> This feature request is for inclusion of list of stop words for various 
> languages. These stop word lists can be used to reduce the noise caused by by 
> frequent but irrelevant words, e.g. when tokenizing texts. The list could be 
> a simple list of words for a first iteration, but could also include 
> multi-stopwords, which will apply to n-grams (i.e. a word in the list will 
> serve to "stop" a multi-word n-gram). 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to