[ 
https://issues.apache.org/jira/browse/SOLR-3393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450667#comment-13450667
 ] 

Shawn Heisey commented on SOLR-3393:
------------------------------------

bq. This was because of the call to findLowestFrequency in doEviction. If one 
element has frequency=0 and all other ones have frequency=maxFreq, 
findLowestFrequency must inspect every slot. But this should be avoidable: 
since doEviction is only called inside put, there is no need to compute 
lowestFreq inside doEviction, put will always set lowestFreq=0 in the end.

If indeed I only used doEviction inside put, then lowestFreq would always be 0. 
 Thinking generally, in case doEviction did get called anywhere else, perhaps 
that value should be tracked rather than computed every time.  There's probably 
a way to quickly figure out the new value if the data structure connected to 
that frequency gets reduced to empty.

If we eliminate the need to cycle through every unused frequency one by one, we 
should be able to keep maxFreq at the max cache size minus one, allowing for 
the greatest granularity under the current implementation.

Like Yonik, I think having decay is important, but the last implementation was 
way too aggressive.  My current thought: Only do it at warm time, and only do 
it after a configurable time period has passed from the previous decay or 
initial cache creation.  Offer an additional option where it would happen after 
X commits.  Default the decay to true and the time period to one hour.  Apply 
the decay by subtracting one rather than doing a bitshift.  This should keep 
things fairly predictable in the short term while also keeping the cache clean 
over the long term.

Unit tests for the decay would probably use values <= 5000 milliseconds for the 
time period.

                
> Implement an optimized LFUCache
> -------------------------------
>
>                 Key: SOLR-3393
>                 URL: https://issues.apache.org/jira/browse/SOLR-3393
>             Project: Solr
>          Issue Type: Improvement
>          Components: search
>    Affects Versions: 3.6, 4.0-ALPHA
>            Reporter: Shawn Heisey
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch, 
> SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch, SOLR-3393.patch
>
>
> SOLR-2906 gave us an inefficient LFU cache modeled on 
> FastLRUCache/ConcurrentLRUCache.  It could use some serious improvement.  The 
> following project includes an Apache 2.0 licensed O(1) implementation.  The 
> second link is the paper (PDF warning) it was based on:
> https://github.com/chirino/hawtdb
> http://dhruvbird.com/lfu.pdf
> Using this project and paper, I will attempt to make a new O(1) cache called 
> FastLFUCache that is modeled on LRUCache.java.  This will (for now) leave the 
> existing LFUCache/ConcurrentLFUCache implementation in place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to