[ 
https://issues.apache.org/jira/browse/SOLR-15836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454349#comment-17454349
 ] 

Michael Gibney commented on SOLR-15836:
---------------------------------------

[PR #448|https://github.com/apache/solr/pull/448] initially simply adds a test 
that demonstrates this counterintuitive behavior as clearly as possible.

Again, I would not characterize this is a bug in the {{simple}} refinement 
method. My current sense is that this behavior would be best addressed by 
adding another (optional, non-default) refinement method -- perhaps a method 
capable of doing more than one iterative refinement pass, when needed?

I am particularly interested in something like selective "breadth-first" 
hierarchical facet evaluation, because of the potential to better control 
fanout (and the influence of overrequest on fanout) in hierarchical facets. But 
one could also take the approach outlined in the comment (quoted in the issue 
description): basically adapting the {{simple}} method to be capable of 
iterative refinement passes -- I think such an approach should converge 
reasonably quickly. 

> Address counterintuitive behavior of JSON "terms" subfacet refinement
> ---------------------------------------------------------------------
>
>                 Key: SOLR-15836
>                 URL: https://issues.apache.org/jira/browse/SOLR-15836
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Facet Module
>    Affects Versions: main (9.0), 8.11
>            Reporter: Michael Gibney
>            Priority: Major
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In distributed faceting, uneven distribution of terms across different shards 
> can artificially include or exclude terms (this discussion will focus on JSON 
> Facet "terms" faceting).
> This is inevitable, and can be mitigated via {{overrequest}} and 
> {{overrefine}} parameters -- respectively casting a "wider net" for "phase#1" 
> (determining the set of "terms of interest") and "phase#2" (cross-checking 
> "terms of interest" against terms that did not initially report them).
> It is possible to devise artificial situations that push the limit of what 
> {{overrefine}} is capable of mitigating, resulting in counterintuitive 
> behavior. But despite such edge cases, in general it is relatively 
> straightforward to reason about how the {{simple}} JSON Facet refinement 
> method works for "flat" (i.e., non-hierarchical) terms facets.
> This issue discusses some ways in which subfacets (hierarchical or nested 
> facets) can more readily behave counterintuitively in practical usage, and 
> possible ways to address/mitigate such behavior.
> ---------------------
> AFAICT, the {{simple}} (default, currently the only) refinement method has 
> two defining requirements:
> # there is at most _one_ refinement request issued to each shard, and
> # any buckets returned are guaranteed to have accurate counts (or perhaps 
> more generally, stats?) reflecting contributions from all shards. (this makes 
> [no 
> guarantees|https://issues.apache.org/jira/browse/SOLR-11159?focusedCommentId=16103386#comment-16103386]
>  about buckets _not_ returned that would in principle be eligible to be 
> returned).
>  
> The simplest counterintuitive case is when refinement of higher-level facets 
> uncovers more subfacets on shards that have no opportunity to influence 
> results/refinement of the child facet. I'm pretty sure it's this situation 
> that's described in [this 
> comment|https://github.com/apache/solr/blob/0287458f836e3b7ea4b2401538b29f3d2e9b6cf4/solr/core/src/test/org/apache/solr/search/facet/TestJsonFacetRefinement.java#L992-L994]
>  (by [~hossman]?):
> {code:java}
>     //   - or at the very least, if the purpose of "_l" is to give other 
> buckets a chance to "bubble up"
>     //     in phase#2, then shouldn't a "_l" refinement requests still 
> include the buckets choosen in
>     //     phase#1, and request that the shard fill them in in addition to 
> returning its own top buckets?
> {code}
> The proposal in the above linked comment would work iff the "own top buckets" 
> returned in phase#2 did not introduce any new/unseen values (and note, the 
> only case in which returning "own top buckets" would be significant _would_ 
> be the case in which it would introduce new/unseen values). If new values 
> _were_ returned in phase#2, the only way to ensure that requirement2 is 
> respected would be to violate requirement1 (i.e., by issuing _another_ 
> refinement request to determine whether any other shards have anything to 
> contribute to the previously unseen value).
> This counterintuitive behavior can't exactly be called a "bug", because IIUC 
> the intuitive behavior is fundamentally incompatible with the current 
> default/only {{simple}} refinement method.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to