[ 
https://issues.apache.org/jira/browse/OPENNLP-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547334#comment-14547334
 ] 

Varun Varadarajan commented on OPENNLP-660:
-------------------------------------------

Hi,

Has this issue been addressed yet?
If not, I did compile a list of stop words for Bulgarian, Danish, Dutch, 
English, Finnish, French, German, Italian, Portuguese, Russian and Spanish 
using lists found in Apache Lucene[1].

I am new to OpenNLP and I was hoping if somebody could help me address this 
issue(if it needs to be addressed). I had a few questions.

1. Can I contribute to this project?
2. Can I submit a pull request on github or is it only commit on svn?

Thanks,
Varun

[1] -> 
https://github.com/apache/lucene-solr/tree/20f9303f5e2378e2238a5381291414881ddb8172/lucene/analysis/common/src/resources/org/apache/lucene/analysis/snowball

> Stoplist
> --------
>
>                 Key: OPENNLP-660
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-660
>             Project: OpenNLP
>          Issue Type: New Feature
>          Components: Parser, Stemmer
>    Affects Versions: tools-1.5.3
>         Environment: all
>            Reporter: Martin Wunderlich
>            Priority: Minor
>              Labels: features, language, model
>   Original Estimate: 0.05h
>  Remaining Estimate: 0.05h
>
> This feature request is for inclusion of list of stop words for various 
> languages. These stop word lists can be used to reduce the noise caused by by 
> frequent but irrelevant words, e.g. when tokenizing texts. The list could be 
> a simple list of words for a first iteration, but could also include 
> multi-stopwords, which will apply to n-grams (i.e. a word in the list will 
> serve to "stop" a multi-word n-gram). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to