[ 
https://issues.apache.org/jira/browse/SOLR-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Gibney reassigned SOLR-15836:
-------------------------------------

    Assignee: Michael Gibney

> Address counterintuitive behavior of JSON "terms" subfacet refinement
> ---------------------------------------------------------------------
>
>                 Key: SOLR-15836
>                 URL: https://issues.apache.org/jira/browse/SOLR-15836
>             Project: Solr
>          Issue Type: Improvement
>          Components: Facet Module
>    Affects Versions: main (9.0), 8.11
>            Reporter: Michael Gibney
>            Assignee: Michael Gibney
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In distributed faceting, uneven distribution of terms across different shards 
> can artificially include or exclude terms (this discussion will focus on JSON 
> Facet "terms" faceting).
> This is inevitable, and can be mitigated via {{overrequest}} and 
> {{overrefine}} parameters -- respectively casting a "wider net" for "phase#1" 
> (determining the set of "terms of interest") and "phase#2" (cross-checking 
> "terms of interest" against terms that did not initially report them).
> It is possible to devise artificial situations that push the limit of what 
> {{overrefine}} is capable of mitigating, resulting in counterintuitive 
> behavior. But despite such edge cases, in general it is relatively 
> straightforward to reason about how the {{simple}} JSON Facet refinement 
> method works for "flat" (i.e., non-hierarchical) terms facets.
> This issue discusses some ways in which subfacets (hierarchical or nested 
> facets) can more readily behave counterintuitively in practical usage, and 
> possible ways to address/mitigate such behavior.
> ---------------------
> AFAICT, the {{simple}} (default, currently the only) refinement method has 
> two defining requirements:
> # there is at most _one_ refinement request issued to each shard, and
> # any buckets returned are guaranteed to have accurate counts (or perhaps 
> more generally, stats?) reflecting contributions from all shards. (this makes 
> [no 
> guarantees|https://issues.apache.org/jira/browse/SOLR-11159?focusedCommentId=16103386#comment-16103386]
>  about buckets _not_ returned that would in principle be eligible to be 
> returned).
>  
> The simplest counterintuitive case is when refinement of higher-level facets 
> uncovers more subfacets on shards that have no opportunity to influence 
> results/refinement of the child facet. I'm pretty sure it's this situation 
> that's described in [this 
> comment|https://github.com/apache/solr/blob/0287458f836e3b7ea4b2401538b29f3d2e9b6cf4/solr/core/src/test/org/apache/solr/search/facet/TestJsonFacetRefinement.java#L992-L994]
>  (by [~hossman]?):
> {code:java}
>     //   - or at the very least, if the purpose of "_l" is to give other 
> buckets a chance to "bubble up"
>     //     in phase#2, then shouldn't a "_l" refinement requests still 
> include the buckets choosen in
>     //     phase#1, and request that the shard fill them in in addition to 
> returning its own top buckets?
> {code}
> The proposal in the above linked comment would work iff the "own top buckets" 
> returned in phase#2 did not introduce any new/unseen values (and note, the 
> only case in which returning "own top buckets" would be significant _would_ 
> be the case in which it would introduce new/unseen values). If new values 
> _were_ returned in phase#2, the only way to ensure that requirement2 is 
> respected would be to violate requirement1 (i.e., by issuing _another_ 
> refinement request to determine whether any other shards have anything to 
> contribute to the previously unseen value).
> This counterintuitive behavior can't exactly be called a "bug", because IIUC 
> the intuitive behavior is fundamentally incompatible with the current 
> default/only {{simple}} refinement method.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to