[
https://issues.apache.org/jira/browse/SOLR-2450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stanislaw Osinski updated SOLR-2450:
------------------------------------
Attachment: SOLR-2450.patch
Patch for the use of stop words from the field's {{StopWordFilterFactory}} and
{{CommonGramsFilterFactory}} in addition to Carrot2's built-in stop words.
Requires the SOLR-2448 and SOLR-2449 patches applied.
> Carrot2 clustering should use both its own and Solr's stop words
> ----------------------------------------------------------------
>
> Key: SOLR-2450
> URL: https://issues.apache.org/jira/browse/SOLR-2450
> Project: Solr
> Issue Type: Improvement
> Components: contrib - Clustering
> Reporter: Stanislaw Osinski
> Assignee: Stanislaw Osinski
> Priority: Minor
> Fix For: 3.2, 4.0
>
> Attachments: SOLR-2450.patch
>
>
> While using only Solr's stop words for clustering isn't a good idea (compared
> to indexing, clustering needs more aggressive stop word removal to get
> reasonable cluster labels), it would be good if Carrot2 used both its own and
> Solr's stop words.
> I'm not sure what the best way to implement this would be though. My first
> thought was to simply load {{stopwords.txt}} from Solr config dir and merge
> them with Carrot2's. But then, maybe a better approach would be to get the
> stop words from the StopFilter being used? Ideally, we should also consider
> the per-field stop filters configured on the fields used for clustering.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]