mosh created SOLR-12958:
---------------------------
Summary: Statistical Phrase Identifier should return phrases in
single field
Key: SOLR-12958
URL: https://issues.apache.org/jira/browse/SOLR-12958
Project: Solr
Issue Type: Improvement
Security Level: Public (Default Security Level. Issues are Public)
Reporter: mosh
It has come to my attention that the phrase identifier introduced in SOLR-9418
does not return phrases that are found only in one of the fields specified in
phrases.fields.
This has proved troublesome for our use case.
The offending line seems to be
{code:java}
final List<Phrase> validScoringPhrasesSorted = contextData.allPhrases.stream()
.filter(p -> 0.0D < p.getTotalScore())
.sorted(Comparator.comparing((p -> p.getTotalScore()),
Collections.reverseOrder()))
.collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the
total score turn out negative, and the phrase gets filtered.
I changed separated the filters to 2 distinct cases:
# Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
# Include single word phrases (*phrases.singleWordPhrases* is set to true)
This can be observed by this change to the component's logid:
{code}if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
// filter single word phrases
phraseStream = contextData.allPhrases.stream()
.filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore ->
fieldScore > 0.0D));
} else {
// include single word phrases, which return a constant score of 0.0
phraseStream = contextData.allPhrases.stream()
.filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore ->
fieldScore >= 0.0D));
}{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]