[
https://issues.apache.org/jira/browse/NUTCH-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119782#comment-15119782
]
Hudson commented on NUTCH-2206:
-------------------------------
FAILURE: Integrated in Nutch-trunk #3342 (See
[https://builds.apache.org/job/Nutch-trunk/3342/])
Added missing stopword file for NUTCH-2206 (sujen:
[http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1727126])
* trunk/conf/stopwords.txt.template
NUTCH-2206 Provide example scoring.similarity.stopword.file (sujen:
[http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1727122])
* trunk/CHANGES.txt
* trunk/conf/nutch-default.xml
> Provide example scoring.similarity.stopword.file
> ------------------------------------------------
>
> Key: NUTCH-2206
> URL: https://issues.apache.org/jira/browse/NUTCH-2206
> Project: Nutch
> Issue Type: Bug
> Components: plugin, scoring
> Affects Versions: 1.11
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 1.12
>
> Attachments: NUTCH-2206.patch, NUTCH-2206.patch
>
>
> The scoring-similarity plugin does not provide an example file for the
> property scoring.similarity.stopword.file.
> This is an issue for a number of reasons, namely
> * A user does not know what it is meant to look like, and
> * We always check of this file and will [throw an exception if it is not
> found|https://github.com/apache/nutch/blob/trunk/src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/DocumentVector.java#L79-L80],
> this may not be picked up by the user until much later.
> I suggest a simple fix here, simply include the [standard English stop words
> taken from Lucene's
> StopAnalyzer|https://github.com/apache/lucene-solr/blob/3f38aba02ce37c6422875d8824ee034d42d635b9/solr/contrib/morphlines-core/src/test-files/solr/collection1/conf/lang/stopwords_en.txt].
> The comments will help people to easily customize the list to whatever they
> require.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)