[
https://issues.apache.org/jira/browse/OPENNLP-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Martin Wiesner updated OPENNLP-660:
-----------------------------------
Summary: Include list of stop words for various languages (was: Stoplist)
> Include list of stop words for various languages
> ------------------------------------------------
>
> Key: OPENNLP-660
> URL: https://issues.apache.org/jira/browse/OPENNLP-660
> Project: OpenNLP
> Issue Type: New Feature
> Components: Parser, Stemmer
> Affects Versions: tools-1.5.3
> Environment: all
> Reporter: Martin Wunderlich
> Priority: Minor
> Labels: features, language, model
> Original Estimate: 0.05h
> Remaining Estimate: 0.05h
>
> This feature request is for inclusion of list of stop words for various
> languages. These stop word lists can be used to reduce the noise caused by by
> frequent but irrelevant words, e.g. when tokenizing texts. The list could be
> a simple list of words for a first iteration, but could also include
> multi-stopwords, which will apply to n-grams (i.e. a word in the list will
> serve to "stop" a multi-word n-gram).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)