[
https://issues.apache.org/jira/browse/SOLR-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man updated SOLR-11733:
----------------------------
Description:
{{json.facet}} refinement is currently "pessimistic" by default. Specifically:
"Long Tail" terms that may not be in the "top n" on every shard, but are in the
"top n + overrequest" for at least 1 shard aren't getting refined and included
in the aggregated response in some cases.
This is different then the "optimistic" approach taken in the existing
{{facet.field}} and {{facet.pivot}} refinement, that refines all known terms
whose counts *might* be high enough to put them in the topN based on what's
known about the lowest count returned by each shard in phase #1.
A mitigating option that people with particular concerns about long tail terms
can consider is to set a "high" value for the {{overrefine}} parameter --
forcing Solr to refine more terms from phase#1 -- but this is somewhat of a
"brute force" workaround, since it doesn't take into account any known info
about the results of each shard from phase#1.
This issue tracks possible improvements that could be made to the faceting code
to be more sophisticated.
----
(NOTE: this Jira was originally filed as a bug report noting that
{{json.facet}} refinement didn't seem to be working properly compared to
facet.field refinement, and early comments are written in this mindset)
was:
Something wonky is happening with {{json.facet}} refinement.
"Long Tail" terms that may not be in the "top n" on every shard, but are in the
"top n + overrequest" for at least 1 shard aren't getting refined and included
in the aggragated response in some cases.
I don't understand the code enough to explain this, but I have some steps to
reproduce that i'll post in a comment shortly
Issue Type: Improvement (was: Bug)
Summary: add an option make json.facet refinement more "optimistic"
like facet.field/facet.pivot so that long tail have a change to bubble up
(was: json.facet refinement fails to bubble up some long tail (overrequested)
terms?)
Edited summary & description based on discussion in comments so far – added an
explicit note about the {{overrefine}} option as a potential
workaround/mitigation approach for people particularly concerned about long
tail terms
> add an option make json.facet refinement more "optimistic" like
> facet.field/facet.pivot so that long tail have a change to bubble up
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-11733
> URL: https://issues.apache.org/jira/browse/SOLR-11733
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Facet Module
> Reporter: Hoss Man
> Priority: Major
>
> {{json.facet}} refinement is currently "pessimistic" by default.
> Specifically: "Long Tail" terms that may not be in the "top n" on every
> shard, but are in the "top n + overrequest" for at least 1 shard aren't
> getting refined and included in the aggregated response in some cases.
> This is different then the "optimistic" approach taken in the existing
> {{facet.field}} and {{facet.pivot}} refinement, that refines all known terms
> whose counts *might* be high enough to put them in the topN based on what's
> known about the lowest count returned by each shard in phase #1.
> A mitigating option that people with particular concerns about long tail
> terms can consider is to set a "high" value for the {{overrefine}} parameter
> -- forcing Solr to refine more terms from phase#1 -- but this is somewhat of
> a "brute force" workaround, since it doesn't take into account any known info
> about the results of each shard from phase#1.
> This issue tracks possible improvements that could be made to the faceting
> code to be more sophisticated.
>
> ----
> (NOTE: this Jira was originally filed as a bug report noting that
> {{json.facet}} refinement didn't seem to be working properly compared to
> facet.field refinement, and early comments are written in this mindset)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]