[
https://issues.apache.org/jira/browse/SOLR-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638028#comment-17638028
]
Shawn Heisey commented on SOLR-15859:
-------------------------------------
I restarted Solr, then did these:
{code:java}
elyograg@bilbo:~$ curl
"https://solr.elyograg.org/solr/dovecot_shard1_replica_n1/select?q=%2A%3A%2A&fq=body%3Atest&rows=0"
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">359</int>
<lst name="params">
<str name="q">*:*</str>
<str name="fq">body:test</str>
<str name="rows">0</str>
<str name="echoParams">all</str>
<str name="rid">-0</str>
</lst>
</lst>
<result name="response" numFound="27590" start="0" numFoundExact="true">
</result>
</response>
{code}
{code:java}
elyograg@bilbo:~$ curl
"https://solr.elyograg.org/solr/dovecot_shard1_replica_n1/select?q=%2A%3A%2A&fq=body%3Ahelp&rows=0"
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">59</int>
<lst name="params">
<str name="q">*:*</str>
<str name="fq">body:help</str>
<str name="rows">0</str>
<str name="echoParams">all</str>
<str name="rid">-1</str>
</lst>
</lst>
<result name="response" numFound="78314" start="0" numFoundExact="true">
</result>
</response>
{code}
I did those queries a few times each. And now I am not seeing it misbehave.
{code:java}
elyograg@bilbo:~$ curl
"https://solr.elyograg.org/solr/dovecot_shard1_replica_n1/admin/cache"
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">11</int>
</lst>
<lst name="queries">
<lst name="body:test">
<long name="hits">4</long>
<long name="rows">27590</long>
</lst>
<lst name="body:help">
<long name="hits">3</long>
<long name="rows">78314</long>
</lst>
</lst>
</response>
{code}
I did see one of the entries disappear once, but it seems mostly OK.
If the one time I saw an entry disappear happened because of a commit, then I
am still confused. Commits to this index are sporadic and random -- it's an
index of all the mail in my dovecot install. So when a new message arrives, or
if mail is moved between folders, or a message gets deleted, then Solr will see
one or more updates followed by a commit.
But I would expect that the autowarm I have configured would preserve the
existing cache keys when a commit that opens a new searcher happens, especially
because this index pretty much never sees a query unless I do one manually so
almost all of the time the caches are empty.
The URL I used above will resolve on the Internet, but the requst will not
actually reach Solr unless I add your public IP to my haproxy config.
Most (and maybe all) of the cache hits actually seem to come from me running
/admin/cache multiple times. Which probably means that the query is being
satisfied in some way that does not actually involve the filterCache, which I
find confusing. And not Caffeine's problem. :) This hitcount inflation is why
I was hoping I would be able to obtain an entry from the cache without
incrementing its hitcounter. That probably goes against the entire point of a
good cache, which is to always have an accurate top-level hitcounter.
> Add handler to dump filter cache
> --------------------------------
>
> Key: SOLR-15859
> URL: https://issues.apache.org/jira/browse/SOLR-15859
> Project: Solr
> Issue Type: Improvement
> Reporter: Andy Lester
> Assignee: Shawn Heisey
> Priority: Major
> Labels: FQ, cache, filtercache, metrics
> Attachments: cacheinfo-1.patch, cacheinfo.patch, fix_92_startup.patch
>
>
> It would be very helpful to be able to inspect the contents of the
> filterCache.
> I'd like to be able to query something like
> {{/admin/caches?type=filter&nentries=1000&sort=numHits+DESC}}
> nentries would be allowed to be -1 to get everything.
> It would be nice to see these data items for each entry. I don't know which
> are available, but I'm thinking blue sky here:
> * cache key, exactly as stored
> * Timestamp when the entry was inserted
> * Whether the insertion of the entry evicted another entry, and if so which
> one
> * Timestamp of when this entry was last hit
> * Number of hits on this entry forever
> * Number of hits on this entry over some time period
> * Number of documents matched by the filter
> * Number of bytes of memory used by the filter
> These are the sorts of questions I'd like to be able answer:
> * "I just did a query that I expect will have added a cache entry. Did it?"
> * "Are my queries hitting existing cache entries?"
> * "How big should I set my filterCache size? Should I limit it by number of
> entries or RAM usage?"
> * "Which of my FQs are getting used the most? These are the ones I want in
> my firstSearcher queries." (I currently determine this by processing my old
> solr logs)
> * "Which filters give me the most bang for the buck in terms of RAM usage?"
> * "I have filter X and filter Y, but would it be beneficial if I made a
> filter X AND Y?"
> * "Which FQs are used more at certain times of the day? (Assuming I take
> regular snapshots throughout the day)"
> I imagine a response might look like:
> {{{}}
> {{ "responseHeader": {}}
> {{ "status": 0,}}
> {{ "QTime": 961}}
> {{ },}}
> {{ "response": {}}
> {{ "numFound": 12104,}}
> {{ "filterCacheKeys": {}}
> {{ [}}
> {{ "language:eng": {}}
> {{ "inserted": "2021-12-04T07:34:16Z",}}
> {{ "lastHit": "2021-12-04T18:17:43Z",}}
> {{ "numHits": 15065,}}
> {{ "numHitsInPastHour": 2319,}}
> {{ "evictedKey": "agelevel:4 shippable:Y",}}
> {{ "numRecordsMatchedByFilter": 24328753,}}
> {{ "bytesUsed": 3041094}}
> {{ }}}
> {{ ],}}
> {{ [}}
> {{ "is_set:N": {}}
> {{ ...}}
> {{ }}}
> {{ ],}}
> {{ [}}
> {{ "language:spa": {}}
> {{ ...}}
> {{ }}}
> {{ ]}}
> {{ }}}
> {{}}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]