[ https://issues.apache.org/jira/browse/OPENNLP-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Martin Wiesner updated OPENNLP-660: ----------------------------------- Summary: Include list of stop words for various languages (was: Stoplist) > Include list of stop words for various languages > ------------------------------------------------ > > Key: OPENNLP-660 > URL: https://issues.apache.org/jira/browse/OPENNLP-660 > Project: OpenNLP > Issue Type: New Feature > Components: Parser, Stemmer > Affects Versions: tools-1.5.3 > Environment: all > Reporter: Martin Wunderlich > Priority: Minor > Labels: features, language, model > Original Estimate: 0.05h > Remaining Estimate: 0.05h > > This feature request is for inclusion of list of stop words for various > languages. These stop word lists can be used to reduce the noise caused by by > frequent but irrelevant words, e.g. when tokenizing texts. The list could be > a simple list of words for a first iteration, but could also include > multi-stopwords, which will apply to n-grams (i.e. a word in the list will > serve to "stop" a multi-word n-gram). -- This message was sent by Atlassian Jira (v8.20.10#820010)