[ 
https://issues.apache.org/jira/browse/SOLR-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-11733:
----------------------------
    Description: 
{{json.facet}} refinement is currently "pessimistic" by default.  Specifically: 
"Long Tail" terms that may not be in the "top n" on every shard, but are in the 
"top n + overrequest" for at least 1 shard aren't getting refined and included 
in the aggregated response in some cases.

This is different then the "optimistic" approach taken in the existing 
{{facet.field}} and {{facet.pivot}} refinement, that refines all known terms 
whose counts *might* be high enough to put them in the topN based on what's 
known about the lowest count returned by each shard in phase #1.

A mitigating option that people with particular concerns about long tail terms 
can consider is to set a "high" value for the {{overrefine}} parameter -- 
forcing Solr to refine more terms from phase#1 -- but this is somewhat of a 
"brute force" workaround, since it doesn't take into account any known info 
about the results of each shard from phase#1.

This issue tracks possible improvements that could be made to the faceting code 
to be more sophisticated.
 
----
(NOTE: this Jira was originally filed as a bug report noting that 
{{json.facet}} refinement didn't seem to be working properly compared to 
facet.field refinement, and early comments are written in this mindset)

  was:

Something wonky is happening with {{json.facet}} refinement.

"Long Tail" terms that may not be in the "top n" on every shard, but are in the 
"top n + overrequest" for at least 1 shard aren't getting refined and included 
in the aggragated response in some cases.

I don't understand the code enough to explain this, but I have some steps to 
reproduce that i'll post in a comment shortly



     Issue Type: Improvement  (was: Bug)
        Summary: add an option make json.facet refinement more "optimistic" 
like facet.field/facet.pivot so that long tail have a change to bubble up  
(was: json.facet refinement fails to bubble up some long tail (overrequested) 
terms?)

Edited summary & description based on discussion in comments so far – added an 
explicit note about the {{overrefine}} option as a potential 
workaround/mitigation approach for people particularly concerned about long 
tail terms

> add an option make json.facet refinement more "optimistic" like 
> facet.field/facet.pivot so that long tail have a change to bubble up
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-11733
>                 URL: https://issues.apache.org/jira/browse/SOLR-11733
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>            Reporter: Hoss Man
>            Priority: Major
>
> {{json.facet}} refinement is currently "pessimistic" by default.  
> Specifically: "Long Tail" terms that may not be in the "top n" on every 
> shard, but are in the "top n + overrequest" for at least 1 shard aren't 
> getting refined and included in the aggregated response in some cases.
> This is different then the "optimistic" approach taken in the existing 
> {{facet.field}} and {{facet.pivot}} refinement, that refines all known terms 
> whose counts *might* be high enough to put them in the topN based on what's 
> known about the lowest count returned by each shard in phase #1.
> A mitigating option that people with particular concerns about long tail 
> terms can consider is to set a "high" value for the {{overrefine}} parameter 
> -- forcing Solr to refine more terms from phase#1 -- but this is somewhat of 
> a "brute force" workaround, since it doesn't take into account any known info 
> about the results of each shard from phase#1.
> This issue tracks possible improvements that could be made to the faceting 
> code to be more sophisticated.
>  
> ----
> (NOTE: this Jira was originally filed as a bug report noting that 
> {{json.facet}} refinement didn't seem to be working properly compared to 
> facet.field refinement, and early comments are written in this mindset)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to