[
https://issues.apache.org/jira/browse/SOLR-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17636241#comment-17636241
]
Ben Manes commented on SOLR-16546:
----------------------------------
The algorithm achieves a hit rate that is the at or near best across a wide
variety of workloads and competitors. That includes those that change over
time, as it will adapt to the observed pattern. It takes into account recency
and frequency, including the popularity history of recently evicted entries
(done by an aged histogram). It should keep your most valued entries and
correct itself if it makes too many mispredictions. Of course you are welcome
to capture an access trace (log of key hashes) if you want to see an analysis
from the simulator.
The main limitation of this policy is that it does not account for the latency
cost, e.g. to give a bias towards retaining slow queries over more frequent
fast ones. There is little research on this topic and those papers use private
traces, most often to block competitive research. I have a good idea for an
approach that in theory might work very well and be inexpensive, but do not
have data to analyze with to justify blindly implementing. Thankfully hit rates
are still a good approximate metric for tuning towards user perceived response
times, so Caffeine should be pretty solid unless otherwise proven.
> Faceting puts an entry for each q into the filterCache
> -------------------------------------------------------
>
> Key: SOLR-16546
> URL: https://issues.apache.org/jira/browse/SOLR-16546
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: faceting
> Affects Versions: 9.0
> Reporter: Andy Lester
> Priority: Minor
>
> I noticed that I was getting far more entries in the filterCache than I was
> expecting. All my app's FQs are driven by the app itself. There are only a
> couple dozen FQs possible in our queries, but I'd be getting ~10K cache
> ejections every hour. That didn't make any sense.
> So I investigated and discovered that making a query using facets adds an
> entry to the filterCache. Here's my demonstration.
> The script show-results is this:
> {{curl -s "$URL/twit/admin/cache" | jq -S .queries
> curl -s "$URL/admin/metrics" | jq
> '.metrics."solr.core.twit"."CACHE.searcher.filterCache".inserts'
> }}
> The /admin/cache handler is Shawn Heisey's cache dumper he's working on in
> ticket SOLR-15859.
> {{# Freshly started Solr. No cache entries.
> $ ./show-results
> {}
> 0
> # Query on "alpha" with facets on.
> $ curl -s $URL/twit/select?q=title:alpha&rows=0&facet=on&facet.field=grouping
> # Now there is a filter cache entry.
> $ ./show-results
> {
> "title:alpha": 0
> }
> 1
> # Query on "beta" with facets on. "beta" shows up in the cache.
> $ curl -s $URL/twit/select?q=title:beta&rows=0&facet=on&facet.field=grouping
> $ ./show-results
> {
> "title:alpha": 0,
> "title:beta": 0
> }
> 2
> # Now query on "gamma" with facets OFF.
> $ curl -s $URL/twit/select?q=title:gamma&rows=0&facet=off&facet.field=grouping
> # The "gamma" does not show up in the filter cache.
> $ ./show-results
> {
> "title:alpha": 0,
> "title:beta": 0
> }
> 2
> # Now do same query on "gamma" with facets ON.
> $ curl -s $URL/twit/select?q=title:gamma&rows=0&facet=on&facet.field=grouping
> # The "gamma" shows up.
> $ ./show-results
> {
> "title:alpha": 0,
> "title:beta": 0,
> "title:gamma": 0
> }
> 3
> }}
> Is this correct behavior? Do I need to adjust my filterCache to allow for
> this?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]