Martin Wunderlich created OPENNLP-660:
-----------------------------------------
Summary: Stoplist
Key: OPENNLP-660
URL: https://issues.apache.org/jira/browse/OPENNLP-660
Project: OpenNLP
Issue Type: New Feature
Components: Parser, Stemmer
Affects Versions: tools-1.5.3
Environment: all
Reporter: Martin Wunderlich
Priority: Minor
This feature request is for inclusion of list of stop words for various
languages. These stop word lists can be used to reduce the noise caused by by
frequent but irrelevant words, e.g. when tokenizing texts. The list could be a
simple list of words for a first iteration, but could also include
multi-stopwords, which will apply to n-grams (i.e. a word in the list will
serve to "stop" a multi-word n-gram).
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)