[ 
https://issues.apache.org/jira/browse/SOLR-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12958:
------------------------
    Description: 
It has come to my attention that the phrase identifier introduced in SOLR-9418 
does not return phrases that are found in only one of the fields specified by 
phrases.fields.
 This has proved troublesome for our use case.
 The offending line seems to be
{code:java}
final List<Phrase> validScoringPhrasesSorted = contextData.allPhrases.stream()
  .filter(p -> 0.0D < p.getTotalScore())
  .sorted(Comparator.comparing((p -> p.getTotalScore()), 
Collections.reverseOrder()))
  .collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that 
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
total score turn out negative, and the phrase gets filtered.
 I changed separated the filters to 2 distinct cases:
 # Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
 # Include single word phrases (*phrases.singleWordPhrases* is set to true)

This can be observed by this change to the component's logid:
{code:java}
if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
      // filter single word phrases
      phraseStream = contextData.allPhrases.stream()
          .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore > 0.0D));
} else {
      // include single word phrases, which return a constant score of 0.0
      phraseStream = contextData.allPhrases.stream()
          .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore >= 0.0D));
}{code}

  was:
It has come to my attention that the phrase identifier introduced in SOLR-9418 
does not return phrases that are found only in one of the fields specified in 
phrases.fields.
 This has proved troublesome for our use case.
 The offending line seems to be
{code:java}
final List<Phrase> validScoringPhrasesSorted = contextData.allPhrases.stream()
  .filter(p -> 0.0D < p.getTotalScore())
  .sorted(Comparator.comparing((p -> p.getTotalScore()), 
Collections.reverseOrder()))
  .collect(Collectors.toList());{code}
Since fields where the phrase is not present return -1.0, and fields that 
contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
total score turn out negative, and the phrase gets filtered.
 I changed separated the filters to 2 distinct cases:
 # Filter out single word phrases (*phrases.singleWordPhrases* is set to false)
 # Include single word phrases (*phrases.singleWordPhrases* is set to true)

This can be observed by this change to the component's logid:
{code:java}
if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
      // filter single word phrases
      phraseStream = contextData.allPhrases.stream()
          .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore > 0.0D));
} else {
      // include single word phrases, which return a constant score of 0.0
      phraseStream = contextData.allPhrases.stream()
          .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
fieldScore >= 0.0D));
}{code}


> Statistical Phrase Identifier should return phrases in single field
> -------------------------------------------------------------------
>
>                 Key: SOLR-12958
>                 URL: https://issues.apache.org/jira/browse/SOLR-12958
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: mosh
>            Priority: Major
>              Labels: phrase, phrasequery
>
> It has come to my attention that the phrase identifier introduced in 
> SOLR-9418 does not return phrases that are found in only one of the fields 
> specified by phrases.fields.
>  This has proved troublesome for our use case.
>  The offending line seems to be
> {code:java}
> final List<Phrase> validScoringPhrasesSorted = contextData.allPhrases.stream()
>   .filter(p -> 0.0D < p.getTotalScore())
>   .sorted(Comparator.comparing((p -> p.getTotalScore()), 
> Collections.reverseOrder()))
>   .collect(Collectors.toList());{code}
> Since fields where the phrase is not present return -1.0, and fields that 
> contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
> total score turn out negative, and the phrase gets filtered.
>  I changed separated the filters to 2 distinct cases:
>  # Filter out single word phrases (*phrases.singleWordPhrases* is set to 
> false)
>  # Include single word phrases (*phrases.singleWordPhrases* is set to true)
> This can be observed by this change to the component's logid:
> {code:java}
> if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
>       // filter single word phrases
>       phraseStream = contextData.allPhrases.stream()
>           .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
> fieldScore > 0.0D));
> } else {
>       // include single word phrases, which return a constant score of 0.0
>       phraseStream = contextData.allPhrases.stream()
>           .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
> fieldScore >= 0.0D));
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to