[jira] [Commented] (SOLR-15859) Add handler to dump filter cache

Shawn Heisey (Jira) Wed, 23 Nov 2022 17:46:05 -0800


    [ 
https://issues.apache.org/jira/browse/SOLR-15859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17638028#comment-17638028
 ]


Shawn Heisey commented on SOLR-15859:
-------------------------------------

I restarted Solr, then did these:

 
{code:java}
elyograg@bilbo:~$ curl 
"https://solr.elyograg.org/solr/dovecot_shard1_replica_n1/select?q=%2A%3A%2A&fq=body%3Atest&rows=0";
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <bool name="zkConnected">true</bool>
  <int name="status">0</int>
  <int name="QTime">359</int>
  <lst name="params">
    <str name="q">*:*</str>
    <str name="fq">body:test</str>
    <str name="rows">0</str>
    <str name="echoParams">all</str>
    <str name="rid">-0</str>
  </lst>
</lst>
<result name="response" numFound="27590" start="0" numFoundExact="true">
</result>
</response>
{code}
 

 
{code:java}
elyograg@bilbo:~$ curl 
"https://solr.elyograg.org/solr/dovecot_shard1_replica_n1/select?q=%2A%3A%2A&fq=body%3Ahelp&rows=0";
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <bool name="zkConnected">true</bool>
  <int name="status">0</int>
  <int name="QTime">59</int>
  <lst name="params">
    <str name="q">*:*</str>
    <str name="fq">body:help</str>
    <str name="rows">0</str>
    <str name="echoParams">all</str>
    <str name="rid">-1</str>
  </lst>
</lst>
<result name="response" numFound="78314" start="0" numFoundExact="true">
</result>
</response>
{code}
 

I did those queries a few times each.  And now I am not seeing it misbehave.

 
{code:java}
elyograg@bilbo:~$ curl 
"https://solr.elyograg.org/solr/dovecot_shard1_replica_n1/admin/cache";
<?xml version="1.0" encoding="UTF-8"?>
<response>
<lst name="responseHeader">
  <int name="status">0</int>
  <int name="QTime">11</int>
</lst>
<lst name="queries">
  <lst name="body:test">
    <long name="hits">4</long>
    <long name="rows">27590</long>
  </lst>
  <lst name="body:help">
    <long name="hits">3</long>
    <long name="rows">78314</long>
  </lst>
</lst>
</response>
{code}
 

I did see one of the entries disappear once, but it seems mostly OK.

If the one time I saw an entry disappear happened because of a commit, then I 
am still confused.  Commits to this index are sporadic and random -- it's an 
index of all the mail in my dovecot install.  So when a new message arrives, or 
if mail is moved between folders, or a message gets deleted, then Solr will see 
one or more updates followed by a commit.

But I would expect that the autowarm I have configured would preserve the 
existing cache keys when a commit that opens a new searcher happens, especially 
because this index pretty much never sees a query unless I do one manually so 
almost all of the time the caches are empty.

The URL I used above will resolve on the Internet, but the requst will not 
actually reach Solr unless I add your public IP to my haproxy config.

Most (and maybe all) of the cache hits actually seem to come from me running 
/admin/cache multiple times.  Which probably means that the query is being 
satisfied in some way that does not actually involve the filterCache, which I 
find confusing.  And not Caffeine's problem. :)  This hitcount inflation is why 
I was hoping I would be able to obtain an entry from the cache without 
incrementing its hitcounter.  That probably goes against the entire point of a 
good cache, which is to always have an accurate top-level hitcounter.

> Add handler to dump filter cache
> --------------------------------
>
>                 Key: SOLR-15859
>                 URL: https://issues.apache.org/jira/browse/SOLR-15859
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Andy Lester
>            Assignee: Shawn Heisey
>            Priority: Major
>              Labels: FQ, cache, filtercache, metrics
>         Attachments: cacheinfo-1.patch, cacheinfo.patch, fix_92_startup.patch
>
>
> It would be very helpful to be able to inspect the contents of the 
> filterCache.
> I'd like to be able to query something like
> {{/admin/caches?type=filter&nentries=1000&sort=numHits+DESC}}
> nentries would be allowed to be -1 to get everything.
> It would be nice to see these data items for each entry. I don't know which 
> are available, but I'm thinking blue sky here:
>  * cache key, exactly as stored
>  * Timestamp when the entry was inserted
>  * Whether the insertion of the entry evicted another entry, and if so which 
> one
>  * Timestamp of when this entry was last hit
>  * Number of hits on this entry forever
>  * Number of hits on this entry over some time period
>  * Number of documents matched by the filter
>  * Number of bytes of memory used by the filter
> These are the sorts of questions I'd like to be able answer:
>  * "I just did a query that I expect will have added a cache entry. Did it?"
>  * "Are my queries hitting existing cache entries?"
>  * "How big should I set my filterCache size? Should I limit it by number of 
> entries or RAM usage?"
>  * "Which of my FQs are getting used the most? These are the ones I want in 
> my firstSearcher queries." (I currently determine this by processing my old 
> solr logs)
>  * "Which filters give me the most bang for the buck in terms of RAM usage?"
>  * "I have filter X and filter Y, but would it be beneficial if I made a 
> filter X AND Y?"
>  * "Which FQs are used more at certain times of the day? (Assuming I take 
> regular snapshots throughout the day)"
> I imagine a response might look like:
> {{{}}
> {{  "responseHeader": {}}
> {{    "status": 0,}}
> {{    "QTime": 961}}
> {{  },}}
> {{  "response": {}}
> {{    "numFound": 12104,}}
> {{    "filterCacheKeys": {}}
> {{      [}}
> {{        "language:eng": {}}
> {{          "inserted": "2021-12-04T07:34:16Z",}}
> {{          "lastHit": "2021-12-04T18:17:43Z",}}
> {{          "numHits": 15065,}}
> {{          "numHitsInPastHour": 2319,}}
> {{          "evictedKey": "agelevel:4 shippable:Y",}}
> {{          "numRecordsMatchedByFilter": 24328753,}}
> {{          "bytesUsed": 3041094}}
> {{        }}}
> {{      ],}}
> {{      [}}
> {{        "is_set:N": {}}
> {{          ...}}
> {{        }}}
> {{      ],}}
> {{      [}}
> {{        "language:spa": {}}
> {{          ...}}
> {{        }}}
> {{      ]}}
> {{    }}}
> {{}}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-15859) Add handler to dump filter cache

Reply via email to