[ 
https://issues.apache.org/jira/browse/SOLR-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013629#comment-13013629
 ] 

Robert Muir commented on SOLR-2450:
-----------------------------------

just to extend on hossman's point, there are a variety of ways someone could be 
setting up stopwords:

* With StopWordFilterFactory
* by configuring their analyzer with <analyzer class=....> and the Analyzer 
actually uses a stopword list internally (in this case, if its a supplied 
lucene analyzer you can check: if (instanceof StopwordAnalyzerBase) ... and 
then invoke StopwordAnalyzerBase.getStopwordSet() on the analyzer, but its true 
someone could make a custom one that uses stopwords, but extends Analyzer 
directly).
* by using stopwords-like stuff such as CommonGramsFilter, that still have the 
concept of stopwords but just work differently.
* by using a custom filter/analyzer of their own that acts like stopfilter.


> Carrot2 clustering should use both its own and Solr's stop words
> ----------------------------------------------------------------
>
>                 Key: SOLR-2450
>                 URL: https://issues.apache.org/jira/browse/SOLR-2450
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Clustering
>            Reporter: Stanislaw Osinski
>            Assignee: Stanislaw Osinski
>            Priority: Minor
>             Fix For: 3.2, 4.0
>
>
> While using only Solr's stop words for clustering isn't a good idea (compared 
> to indexing, clustering needs more aggressive stop word removal to get 
> reasonable cluster labels), it would be good if Carrot2 used both its own and 
> Solr's stop words.
> I'm not sure what the best way to implement this would be though. My first 
> thought was to simply load {{stopwords.txt}} from Solr config dir and merge 
> them with Carrot2's. But then, maybe a better approach would be to get the 
> stop words from the StopFilter being used? Ideally, we should also consider 
> the per-field stop filters configured on the fields used for clustering.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to