[ 
https://issues.apache.org/jira/browse/SOLR-12958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mosh updated SOLR-12958:
------------------------
    Affects Version/s: master (8.0)
                       7.5

> Statistical Phrase Identifier should return phrases in single field
> -------------------------------------------------------------------
>
>                 Key: SOLR-12958
>                 URL: https://issues.apache.org/jira/browse/SOLR-12958
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.5, master (8.0)
>            Reporter: mosh
>            Priority: Major
>              Labels: phrase, phrasequery
>         Attachments: SOLR-12958.patch
>
>
> It has come to my attention that the phrase identifier introduced in 
> SOLR-9418 does not return phrases that are found in only one of the fields 
> specified by phrases.fields.
>  This has proved troublesome for our use case.
>  The offending line seems to be
> {code:java}
> final List<Phrase> validScoringPhrasesSorted = contextData.allPhrases.stream()
>   .filter(p -> 0.0D < p.getTotalScore())
>   .sorted(Comparator.comparing((p -> p.getTotalScore()), 
> Collections.reverseOrder()))
>   .collect(Collectors.toList());{code}
> Since fields where the phrase is not present return -1.0, and fields that 
> contain the phrase return a score in the range of 0.0 <= score >= 1.0, the 
> total score turn out negative, and the phrase gets filtered.
>  I changed separated the filters to 2 distinct cases:
>  # Filter out single word phrases (*phrases.singleWordPhrases* is set to 
> false)
>  # Include single word phrases (*phrases.singleWordPhrases* is set to true)
> This can be observed by this change to the component's logid:
> {code:java}
> if(!rb.req.getParams().getBool(PHRASE_MATCH_SINGLE_WORD, false)) {
>       // filter single word phrases
>       phraseStream = contextData.allPhrases.stream()
>           .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
> fieldScore > 0.0D));
> } else {
>       // include single word phrases, which return a constant score of 0.0
>       phraseStream = contextData.allPhrases.stream()
>           .filter(p -> p.fieldScores.values().stream().anyMatch(fieldScore -> 
> fieldScore >= 0.0D));
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to