[ 
https://issues.apache.org/jira/browse/SOLR-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17519109#comment-17519109
 ] 

Chris M. Hostetter commented on SOLR-16144:
-------------------------------------------

I honestly don't know why the code rounds to 5 digits ... IIRC that came from 
the original contribution in SOLR-9480 – maybe [~solrtrey] remembers?

I also can't think of any major downside to "deferring" the rounding until 
externalizing – if al tests pass it seems fine to me.

 

(have you considered removing the rounding completely? .. i'm sure it would 
cause all sorts of test failures where we currently expect exact values, but 
other then that does it caus any problems?

 

> Don't internally round [foreground|background]_popularity values in 
> RelatednessAgg
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-16144
>                 URL: https://issues.apache.org/jira/browse/SOLR-16144
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>    Affects Versions: main (10.0)
>            Reporter: Michael Gibney
>            Priority: Trivial
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The "relatedness" facet function supports the concept of 
> {{foreground_popularity}} and {{background_popularity}} -- i.e., the 
> cardinality of the intersection of bucket domain with the foreground and 
> background sets (respectively), each normalized with respect to background 
> set cardinality.
> The logic appears to be:
> # To provide clients with context of computed relatedness values
> # To preemptively (optionally) screen out "noise" from low-frequency terms 
> via the {{min_popularity}} function parameter.
> For both purposes, popularity values are currently rounded to 5 digits.
> This issue proposes that although rounding to 5 digits makes sense for the 
> _first_ case (providing context to clients), this arbitrary truncation does 
> not make sense as currently implemented for internally evaluating threshold 
> pop values for bucket inclusion.
> Consider the case of a high-cardinality field with a relatively large 
> background set and a selective foreground set. For {{|background_set| = 
> 2,000,000}} and a foreground set of cardinality 9, even a bucket with a 
> domain that exactly matches the foreground set would be screened out, for 
> _any_ explicit setting of {{min_popularity}}.
> This behavior is due to where the rounding takes place (internally, upon 
> initial {{computeDerivedValues()}}). It is further problematic that 
> {{RelatednessAgg}} will currently accept {{min_popularity < 0.00001}}, which 
> would be guaranteed to exclude _all_ buckets.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to