mosh created SOLR-12958:
---------------------------

             Summary: Statistical Phrase Identifier should return phrases in 
single field
                 Key: SOLR-12958
                 URL: https://issues.apache.org/jira/browse/SOLR-12958
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
            Reporter: mosh


It has come to my attention that the phrase identifier introduced in SOLR-9418 
does not return phrases that are found only in one of the fields specified in 
phrases.fields.
This has proved troublesome for our use case.
The offending line seems to be
{code:java}
final List<Phrase> validScoringPhrasesSorted = contextData.allPhrases.stream()
  .filter(p -> 0.0D < p.getTotalScore())
  .sorted(Comparator.comparing((p -> p.getTotalScore()), 
Collections.reverseOrder()))
  .collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that 
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
total score turn out negative, and the phrase gets filtered.
I changed separated the filters to 2 distinct cases:
# Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
# Include single word phrases (*phrases.singleWordPhrases* is set to true)

This can be observed by this change to the component's logid:
{code}if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
      // filter single word phrases
      phraseStream = contextData.allPhrases.stream()
          .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore > 0.0D));
    } else {
      // include single word phrases, which return a constant score of 0.0
      phraseStream = contextData.allPhrases.stream()
          .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore >= 0.0D));
    }{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to