My original cache management scheme used a weighted LRU where a cache
hit would set the high order bit of an aging value in the BDB. Periodic
aging cycles would do a right shift of the aging values, ending up with
aging values that, to a large degree, represented the historical hit
rate. It worked like a charm for small number of buffers, but as memory
got larger and buffer pools much bigger, the cost of maintaining the
list became prohibitive. Borland, I believe, ripped it out.
I schema I have used successfully in subsequent systems is a cycle
manager that wakes up from time to time, bumps the cycle number, does
any necessary work, and goes back to sleep. In this schema, an object
reference copies the cycle number into the object without worrying about
contention or interlocking. When the cycle manager wakes up, it does a
linear scan through the managed objects moving objects referenced in
their previous cycle to the front of the list with a single lock on the
LRU list.
The two techniques could easily be combined, giving priority to objects
with references in several cycles to objects referenced in only a single
cycle. It would take a slightly fancier data structure to merge
weighted references into the middle of list rather than just the list
header, but nothing that a little cleverness couldn't address.
A cycle manager, incidentally, could also allow a BDB cache hash table
to be searched without a lock, really speeding things up.
On 5/6/2014 5:21 PM, Leyne, Sean wrote:
All,
I was reading details on a recent update to Apache Hadoop engine and
one of the changes dealt with a change to their cache algorithm.
Hadoop had a cache which used a simple LRU algorithm. As we know most
LRU algorithms have the problem that "hot" pages can be flushed when a
query which requires reading millions of pages is submitted -- the
Firebird equivalent of a full table scan -- The reads for old pages
pushes "hot" pages from the cache.
Their solution was to define 2 cache levels:
* Level 1: for pages which had only been accessed a single time --
this used a simple index for locating the page in the list. Once
a page had been accessed more than once it was promoted to the
second cache.
* Level 2: for pages which had been accessed/references more than
once -- this uses an LRU.
Within the configuration the ratio of memory allocated between the
caches (default = 25%/75% IIRC)
I realize that 2 caches will make single cache requests slower, but it
has the benefit of more likely keeping the "hot" pages in memory for
longer, thus improving overall performance.
I also know that there has been historical discussion of changing the
engine to recognize table scan/NATURAL and backup to modify the cache
operation.
But I wonder if this would be an approach that Firebird should
consider, since it seems to address the known issues while not
requiring significant modifications.
Sean
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel