My original cache management scheme used a weighted LRU where a cache hit would set the high order bit of an aging value in the BDB. Periodic aging cycles would do a right shift of the aging values, ending up with aging values that, to a large degree, represented the historical hit rate. It worked like a charm for small number of buffers, but as memory got larger and buffer pools much bigger, the cost of maintaining the list became prohibitive. Borland, I believe, ripped it out.

I schema I have used successfully in subsequent systems is a cycle manager that wakes up from time to time, bumps the cycle number, does any necessary work, and goes back to sleep. In this schema, an object reference copies the cycle number into the object without worrying about contention or interlocking. When the cycle manager wakes up, it does a linear scan through the managed objects moving objects referenced in their previous cycle to the front of the list with a single lock on the LRU list.

The two techniques could easily be combined, giving priority to objects with references in several cycles to objects referenced in only a single cycle. It would take a slightly fancier data structure to merge weighted references into the middle of list rather than just the list header, but nothing that a little cleverness couldn't address.

A cycle manager, incidentally, could also allow a BDB cache hash table to be searched without a lock, really speeding things up.


On 5/6/2014 5:21 PM, Leyne, Sean wrote:
All,
I was reading details on a recent update to Apache Hadoop engine and one of the changes dealt with a change to their cache algorithm. Hadoop had a cache which used a simple LRU algorithm. As we know most LRU algorithms have the problem that "hot" pages can be flushed when a query which requires reading millions of pages is submitted -- the Firebird equivalent of a full table scan -- The reads for old pages pushes "hot" pages from the cache.
Their solution was to define 2 cache levels:

  * Level 1: for pages which had only been accessed a single time --
    this used a simple index for locating the page in the list.  Once
    a page had been accessed more than once it was promoted to the
    second cache.

  * Level 2: for pages which had been accessed/references more than
    once -- this uses an LRU.

Within the configuration the ratio of memory allocated between the caches (default = 25%/75% IIRC) I realize that 2 caches will make single cache requests slower, but it has the benefit of more likely keeping the "hot" pages in memory for longer, thus improving overall performance. I also know that there has been historical discussion of changing the engine to recognize table scan/NATURAL and backup to modify the cache operation. But I wonder if this would be an approach that Firebird should consider, since it seems to address the known issues while not requiring significant modifications.
Sean


------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce


Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to