[
https://issues.apache.org/jira/browse/OPENNLP-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547334#comment-14547334
]
Varun Varadarajan commented on OPENNLP-660:
-------------------------------------------
Hi,
Has this issue been addressed yet?
If not, I did compile a list of stop words for Bulgarian, Danish, Dutch,
English, Finnish, French, German, Italian, Portuguese, Russian and Spanish
using lists found in Apache Lucene[1].
I am new to OpenNLP and I was hoping if somebody could help me address this
issue(if it needs to be addressed). I had a few questions.
1. Can I contribute to this project?
2. Can I submit a pull request on github or is it only commit on svn?
Thanks,
Varun
[1] ->
https://github.com/apache/lucene-solr/tree/20f9303f5e2378e2238a5381291414881ddb8172/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball
> Stoplist
> --------
>
> Key: OPENNLP-660
> URL: https://issues.apache.org/jira/browse/OPENNLP-660
> Project: OpenNLP
> Issue Type: New Feature
> Components: Parser, Stemmer
> Affects Versions: tools-1.5.3
> Environment: all
> Reporter: Martin Wunderlich
> Priority: Minor
> Labels: features, language, model
> Original Estimate: 0.05h
> Remaining Estimate: 0.05h
>
> This feature request is for inclusion of list of stop words for various
> languages. These stop word lists can be used to reduce the noise caused by by
> frequent but irrelevant words, e.g. when tokenizing texts. The list could be
> a simple list of words for a first iteration, but could also include
> multi-stopwords, which will apply to n-grams (i.e. a word in the list will
> serve to "stop" a multi-word n-gram).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)