[
https://issues.apache.org/jira/browse/SOLR-3085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13677709#comment-13677709
]
Naomi Dushay commented on SOLR-3085:
------------------------------------
We avoided this by adding stopwords to our string fields (and simultaneously
dealing with whitespace around punctuation marks). It's dumb, but it worked
fine in dismax. We no longer use stopwords in general.
<!-- single token with punctuation terms removed so dismax doesn't look for
punctuation terms in these fields -->
<!-- On client side, Lucene query parser breaks things up by whitespace
*before* field analysis for dismax -->
<!-- so punctuation terms (& : ;) are stopwords to allow results from other
fields when these chars are surrounded by spaces in query -->
<fieldType name="string_punct_stop" class="solr.TextField" omitNorms="true">
<analyzer type="index">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.ICUNormalizer2FilterFactory" name="nfkc"
mode="compose" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.ICUNormalizer2FilterFactory" name="nfkc"
mode="compose" />
<!-- removing punctuation for Lucene query parser issues -->
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords_punctuation.txt" enablePositionIncrements="true" />
</analyzer>
</fieldType>
> Fix the dismax/edismax stopwords mm issue
> -----------------------------------------
>
> Key: SOLR-3085
> URL: https://issues.apache.org/jira/browse/SOLR-3085
> Project: Solr
> Issue Type: Bug
> Components: query parsers
> Reporter: Jan Høydahl
> Labels: MinimumShouldMatch, dismax, stopwords
>
> As discussed here http://search-lucene.com/m/Wr7iz1a95jx and here
> http://search-lucene.com/m/Yne042qEyCq1 and here
> http://search-lucene.com/m/RfAp82nSsla DisMax has an issue with stopwords if
> not all fields used in QF have exactly same stopword lists.
> Typical solution is to not use stopwords or harmonize stopword lists across
> all fields in your QF, or relax the MM to a lower percentag. Sometimes these
> are not acceptable workarounds, and we should find a better solution.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]