[
https://issues.apache.org/jira/browse/SOLR-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013629#comment-13013629
]
Robert Muir commented on SOLR-2450:
-----------------------------------
just to extend on hossman's point, there are a variety of ways someone could be
setting up stopwords:
* With StopWordFilterFactory
* by configuring their analyzer with <analyzer class=....> and the Analyzer
actually uses a stopword list internally (in this case, if its a supplied
lucene analyzer you can check: if (instanceof StopwordAnalyzerBase) ... and
then invoke StopwordAnalyzerBase.getStopwordSet() on the analyzer, but its true
someone could make a custom one that uses stopwords, but extends Analyzer
directly).
* by using stopwords-like stuff such as CommonGramsFilter, that still have the
concept of stopwords but just work differently.
* by using a custom filter/analyzer of their own that acts like stopfilter.
> Carrot2 clustering should use both its own and Solr's stop words
> ----------------------------------------------------------------
>
> Key: SOLR-2450
> URL: https://issues.apache.org/jira/browse/SOLR-2450
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Clustering
> Reporter: Stanislaw Osinski
> Assignee: Stanislaw Osinski
> Priority: Minor
> Fix For: 3.2, 4.0
>
>
> While using only Solr's stop words for clustering isn't a good idea (compared
> to indexing, clustering needs more aggressive stop word removal to get
> reasonable cluster labels), it would be good if Carrot2 used both its own and
> Solr's stop words.
> I'm not sure what the best way to implement this would be though. My first
> thought was to simply load {{stopwords.txt}} from Solr config dir and merge
> them with Carrot2's. But then, maybe a better approach would be to get the
> stop words from the StopFilter being used? Ideally, we should also consider
> the per-field stop filters configured on the fields used for clustering.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]