[
https://issues.apache.org/jira/browse/SOLR-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
mosh updated SOLR-12958:
------------------------
Affects Version/s: master (8.0)
7.5
> Statistical Phrase Identifier should return phrases in single field
> -------------------------------------------------------------------
>
> Key: SOLR-12958
> URL: https://issues.apache.org/jira/browse/SOLR-12958
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 7.5, master (8.0)
> Reporter: mosh
> Priority: Major
> Labels: phrase, phrasequery
> Attachments: SOLR-12958.patch
>
>
> It has come to my attention that the phrase identifier introduced in
> SOLR-9418 does not return phrases that are found in only one of the fields
> specified by phrases.fields.
> This has proved troublesome for our use case.
> The offending line seems to be
> {code:java}
> final List<Phrase> validScoringPhrasesSorted = contextData.allPhrases.stream()
> .filter(p -> 0.0D < p.getTotalScore())
> .sorted(Comparator.comparing((p -> p.getTotalScore()),
> Collections.reverseOrder()))
> .collect(Collectors.toList());{code}
> Since fields where the phrase is not present return -1.0, and fields that
> contain the phrase return a score in the range of 0.0 <= score >= 1.0, the
> total score turn out negative, and the phrase gets filtered.
> I changed separated the filters to 2 distinct cases:
> # Filter out single word phrases (*phrases.singleWordPhrases* is set to
> false)
> # Include single word phrases (*phrases.singleWordPhrases* is set to true)
> This can be observed by this change to the component's logid:
> {code:java}
> if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
> // filter single word phrases
> phraseStream = contextData.allPhrases.stream()
> .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore ->
> fieldScore > 0.0D));
> } else {
> // include single word phrases, which return a constant score of 0.0
> phraseStream = contextData.allPhrases.stream()
> .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore ->
> fieldScore >= 0.0D));
> }{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]