[ 
https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13261197#comment-13261197
 ] 

Hoss Man commented on SOLR-3393:
--------------------------------

bq. I will attempt to make a new O(1) cache called FastLFUCache

{{#OhDearGodPleaseNotAnotherClassWithFastInTheName}}

Please, please, please lets end the madness of subjective adjectives in class 
names ... if it's an LFU cache wrapped around a "hawtdb" why don't we just call 
it "HawtDbLFUCache" ?

bq. I've been working on this. I've come to realize that I don't completely 
understand how CacheRegenerator works. I suspect that it is geared around LRU 
caches and that the new cache won't have any of the frequency information from 
the old one, it will just put the entries into the cache as if they were new. 
Can anyone confirm this?

The idea behind the CacheRegenerator API is to be as simple as possible and 
agnostic to:
* the Cache Impl (ie: LRUCache vs LFUCache vs HawtDbLFUCache) 
* the cache usage (ie: Query->DocSets vs Query->DocList vs 
String->MyCustomClass)
* the means of generating values from keys (ie: how do you know which 
MyCustomClass should be cached for which String)

... so you can have a custom (named) cache instance declared in your 
solrconfig.xml with your own MySpecialCacheRegenerator that knows about your 
usecase and might do something special with the keys/values (like: short-circut 
part of the generation if it can see the data hasn't changed, or read from 
authoritative data files outside of solr, etc...) and then use *any* Cache impl 
class that you're heart desires, and things will still work right.

bq. After the new cache is regenerated, should I go through the new cache, grab 
the frequency information from the old cache with each key, and fix the new 
cache up?

you certainly could -- when {{(new HawtDbLFUCache(...)).warm(...)}} is called, 
it needs to delegate to the regenerator for pulling values from the "old" 
cache, but that doesn't mean it can't also directly ask the "old" cache 
instance for stats about each of the old keys as it loops over them -- 
remember: the "new" cache is the one inspecting the "old" cache and deciding 
what things to ask the regenerator to generate.

But i question whether you really want any sort of stats from the "old" cache 
copied over to the "new" cache.  it is, after all, a completely new cache -- 
with new usage.  should the stats really be preserved forever?  regardless of 
how popular an object was in the "old" cache instance, should we automatically 
assume it's equally popular in the "new" cache instance?
                
> Implement an optimized LFUCache
> -------------------------------
>
>                 Key: SOLR-3393
>                 URL: https://issues.apache.org/jira/browse/SOLR-3393
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 3.6, 4.0
>            Reporter: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3393.patch, SOLR-3393.patch
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on 
> FastLRUCache/ConcurrentLRUCache.  It could use some serious improvement.  The 
> following project includes an Apache 2.0 licensed O(1) implementation.  The 
> second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called 
> FastLFUCache that is modeled on LRUCache.java.  This will (for now) leave the 
> existing LFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to