[jira] [Commented] (SOLR-13807) Caching for term facet counts

Chris M. Hostetter (Jira) Fri, 13 Mar 2020 17:27:34 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-13807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17059139#comment-17059139
 ]


Chris M. Hostetter commented on SOLR-13807:
-------------------------------------------

Hey Michael, I've been offline all week, but hopefully i'll be able to start 
digging in an reviewing some of this more at some point next week.

as far as some of your process questions: I don't know if/how force-pushing 
affects things, but from a reviewing / "get more eyeballs" on things i really 
think it would be cleaner to have 2 discrete PRs that we can iterate from, and 
link those 2 PRs from each of the 2 distinct jiras, so we can keep the 
comments/discussion on each distinct, particularly since one or the other might 
attract more/less attention from folks who are more/less passionate about the 
new functionality and/or concerned about the internal change.

FWIW: I personally don't care about PRs (or any github "value add" 
functionality for that matter) at all – to me they are just patch files i can 
find at special URLs by adding ".patch" to the end.  i tell you this not to 
discourage you from using PRs (my anachronistic view on development shouldn't 
stop you from using the tools you're comfortable with, and other people in the 
community are – or at least claim to be – more likely to help review PRs then 
patches) but just to clarify that you certainly don't need to worry about 
trying to re-use the existing PR ... feel free to close it and open new ones 
for each of the distinct issues – the github-to-Jira bridge should pick them up.

 

> Caching for term facet counts
> -----------------------------
>
>                 Key: SOLR-13807
>                 URL: https://issues.apache.org/jira/browse/SOLR-13807
>             Project: Solr
>          Issue Type: New Feature
>          Components: Facet Module
>    Affects Versions: master (9.0), 8.2
>            Reporter: Michael Gibney
>            Priority: Minor
>         Attachments: SOLR-13807__SOLR-13132_test_stub.patch
>
>
> Solr does not have a facet count cache; so for _every_ request, term facets 
> are recalculated for _every_ (facet) field, by iterating over _every_ field 
> value for _every_ doc in the result domain, and incrementing the associated 
> count.
> As a result, subsequent requests end up redoing a lot of the same work, 
> including all associated object allocation, GC, etc. This situation could 
> benefit from integrated caching.
> Because of the domain-based, serial/iterative nature of term facet 
> calculation, latency is proportional to the size of the result domain. 
> Consequently, one common/clear manifestation of this issue is high latency 
> for faceting over an unrestricted domain (e.g., {{\*:\*}}), as might be 
> observed on a top-level landing page that exposes facets. This type of 
> "static" case is often mitigated by external (to Solr) caching, either with a 
> caching layer between Solr and a front-end application, or within a front-end 
> application, or even with a caching layer between the end user and a 
> front-end application.
> But in addition to the overhead of handling this caching elsewhere in the 
> stack (or, for a new user, even being aware of this as a potential issue to 
> mitigate), any external caching mitigation is really only appropriate for 
> relatively static cases like the "landing page" example described above. A 
> Solr-internal facet count cache (analogous to the {{filterCache}}) would 
> provide the following additional benefits:
>  # ease of use/out-of-the-box configuration to address a common performance 
> concern
>  # compact (specifically caching count arrays, without the extra baggage that 
> accompanies a naive external caching approach)
>  # NRT-friendly (could be implemented to be segment-aware)
>  # modular, capable of reusing the same cached values in conjunction with 
> variant requests over the same result domain (this would support common use 
> cases like paging, but also potentially more interesting direct uses of 
> facets). 
>  # could be used for distributed refinement (i.e., if facet counts over a 
> given domain are cached, a refinement request could simply look up the 
> ordinal value for each enumerated term and directly grab the count out of the 
> count array that was cached during the first phase of facet calculation)
>  # composable (e.g., in aggregate functions that calculate values based on 
> facet counts across different domains, like SKG/relatedness – see SOLR-13132)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13807) Caching for term facet counts

Reply via email to