[jira] [Commented] (SOLR-13807) Caching for term facet counts

Michael Gibney (Jira) Thu, 05 Mar 2020 08:51:20 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052328#comment-17052328
 ]


Michael Gibney commented on SOLR-13807:
---------------------------------------

Regarding TermFacetCacheRegenerator, my understanding of CacheHelper.getKey() 
is that the returned keys should work the same way at the segment level that 
they do at the top level; notably, that the types of modifications you mention 
(deletes, in-place DV updates, etc.) should result in the creation of a new 
cache key. Is that not true?

{{countCacheDf}} is defined wrt the main domain DocSet.size(), and only affects 
whether the {{termFacetCache}} is consulted for a given domain-request 
combination. It should _not_ affect the cached values themselves, if that's 
your concern. As far as the temporarily tabled concerns about concurrent 
mutation, this was something I considered, and (I think) addressed 
[here|https://github.com/apache/lucene-solr/pull/751/files#diff-1b16fc96c8dde547ddde619e54a45c26R1158-R1161]:
{code:java}
      if (segmentCache == null) {
        // no cache presence; initialize.
        cacheState = CacheState.NOT_CACHED;
        newSegmentCache = new 
HashMap<>(fcontext.searcher.getIndexReader().leaves().size() + 1);
      } else if (segmentCache.containsKey(topLevelKey)) {
        topLevelEntry = segmentCache.get(topLevelKey);
        CachedCountSlotAcc acc = new CachedCountSlotAcc(fcontext, 
topLevelEntry.topLevelCounts);
        return new SweepCountAccStruct(qKey, docs, CacheState.CACHED, null, 
isBase, acc,
            new ReadOnlyCountSlotAccWrapper(fcontext, acc), acc);
      } else {
        // defensive copy, since cache entries are shared across threads
        cacheState = CacheState.PARTIALLY_CACHED;
        newSegmentCache = new 
HashMap<>(fcontext.searcher.getIndexReader().leaves().size() + 1);
        newSegmentCache.putAll(segmentCache);
      }
{code}
In that last {{else}} block, each domain-request combination that finds a 
partial cache entry (with some segments populated), creates and populates an 
entirely new, request-private top-level cache entry (initially sharing the 
immutable segment-level entries from the extant top-level entry). On completion 
of processing, this new top-level entry is placed atomically into the 
termFacetCache. I believe this should be robust; and if indeed robust, at worst 
you'd end up with concurrent requests each doing the work of creating 
equivalent top-level cache entries, the last of which would remain in the cache 
... which should be no worse than the status quo, where each request always 
does all the work of recalculating facet counts.

> Caching for term facet counts
> -----------------------------
>
>                 Key: SOLR-13807
>                 URL: https://issues.apache.org/jira/browse/SOLR-13807
>             Project: Solr
>          Issue Type: New Feature
>          Components: Facet Module
>    Affects Versions: master (9.0), 8.2
>            Reporter: Michael Gibney
>            Priority: Minor
>         Attachments: SOLR-13807__SOLR-13132_test_stub.patch
>
>
> Solr does not have a facet count cache; so for _every_ request, term facets 
> are recalculated for _every_ (facet) field, by iterating over _every_ field 
> value for _every_ doc in the result domain, and incrementing the associated 
> count.
> As a result, subsequent requests end up redoing a lot of the same work, 
> including all associated object allocation, GC, etc. This situation could 
> benefit from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet 
> calculation, latency is proportional to the size of the result domain. 
> Consequently, one common/clear manifestation of this issue is high latency 
> for faceting over an unrestricted domain (e.g., {{\*:\*}}), as might be 
> observed on a top-level landing page that exposes facets. This type of 
> "static" case is often mitigated by external (to Solr) caching, either with a 
> caching layer between Solr and a front-end application, or within a front-end 
> application, or even with a caching layer between the end user and a 
> front-end application.
> But in addition to the overhead of handling this caching elsewhere in the 
> stack (or, for a new user, even being aware of this as a potential issue to 
> mitigate), any external caching mitigation is really only appropriate for 
> relatively static cases like the "landing page" example described above. A 
> Solr-internal facet count cache (analogous to the {{filterCache}}) would 
> provide the following additional benefits:
>  # ease of use/out-of-the-box configuration to address a common performance 
> concern
>  # compact (specifically caching count arrays, without the extra baggage that 
> accompanies a naive external caching approach)
>  # NRT-friendly (could be implemented to be segment-aware)
>  # modular, capable of reusing the same cached values in conjunction with 
> variant requests over the same result domain (this would support common use 
> cases like paging, but also potentially more interesting direct uses of 
> facets). 
>  # could be used for distributed refinement (i.e., if facet counts over a 
> given domain are cached, a refinement request could simply look up the 
> ordinal value for each enumerated term and directly grab the count out of the 
> count array that was cached during the first phase of facet calculation)
>  # composable (e.g., in aggregate functions that calculate values based on 
> facet counts across different domains, like SKG/relatedness – see SOLR-13132)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13807) Caching for term facet counts

Reply via email to