Martin Wunderlich created OPENNLP-660:
-----------------------------------------

             Summary: Stoplist
                 Key: OPENNLP-660
                 URL: https://issues.apache.org/jira/browse/OPENNLP-660
             Project: OpenNLP
          Issue Type: New Feature
          Components: Parser, Stemmer
    Affects Versions: tools-1.5.3
         Environment: all
            Reporter: Martin Wunderlich
            Priority: Minor


This feature request is for inclusion of list of stop words for various 
languages. These stop word lists can be used to reduce the noise caused by by 
frequent but irrelevant words, e.g. when tokenizing texts. The list could be a 
simple list of words for a first iteration, but could also include 
multi-stopwords, which will apply to n-grams (i.e. a word in the list will 
serve to "stop" a multi-word n-gram). 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to