I also agree that page cache enhancement is interesting, but probably
should be tackled as a separate project.  But keeping this goal in mind
while making changes for backup is a good thing.  An interface that
that allows backup to use/reuse a single buffer in the page cache seems
reasonable. Specializing it would seem to allow some optimizations where free page searching could be avoided for this operation which at
a very low level is going to be pushing/pulling pages as fast as possible.

I have seen the following ideas work well in a weight based page cache, it tries to limit the overhead of weights by using multiple lru, but still have some of the benefit of weight based scheme:
1) have a much smaller range than 0-100, something like 5 where each
   value is it's own lru queue.  This reduces the overhead of searching
   and sorting based on weight.
2) as dan suggests, something like:
   no weight: free list
   0: backup page, linear scan heap pages, read ahead,
   1: probe accessed heap page
   2: leaf page
   3: non-leaf page
   4: root
3) to account for re-reference, pages move up in value when re-referenced. Revalue happens only when page is accessed so
page is already latched, so limits additional overhead needed
to reweigh page.
 various methods can be used for moving down in value:
    o whole queues at a time
o individual pages in lru order, based on some sort of clock like current clock




Øystein Grøvlen wrote:
"DJD" == Daniel John Debrunner <[EMAIL PROTECTED]> writes:


    DJD> I think modifications to the cache would be useful for b), so
    DJD> that entries in the cache (through generic apis, not specific
    DJD> to store) could mark how "useful/valuable" they are. Just a
    DJD> simple scheme, lower numbers less valuable, higher numbers
    DJD> more valuable, and if it makes it easier to fix a range,
    DJD> e.g. 0-100, then that would be ok. Then the store could added
    DJD> pages to the cache with this weighting, e.g. (to get the
    DJD> general idea)

    DJD>      pages for backup - weight 0
    DJD>      overflow column pages - weight 10
    DJD>      regular pages - weight 20
    DJD>      leaf index pages - weight 30
    DJD>       root index pages 80

    DJD> This weight would then be factored into the decision to throw pages out
    DJD> or not.

I agree that we need some mechanism to prevent operations from filling
the cache with pages that is not likely to be accesssed again in the
near future.  However, I am afraid that a very detailed "cost-based"
scheme may create a significant overhead compared to a simple LRU
scheme.

One may operate with separate LRU queues for different weights, but I
guess the number of possible weights will have to be restricted in
that case.

I am also not convinced that it is the type of page that is the most
important criteria for caching.  What matters is access frequency.
The page type may give a hint, but leaf pages of one index may be more
frequently accessed than root pages of other indexes.

The type of access is also a relevant criteria.  Pages accessed
sequentially is often less likely to be accessed again in the near
future than pages accessed by direct lookup.  A separate LRU queue for
sequentially accessed pages may prevent backup and other sequentially
scans (e.g., select * from t) from throwing out directly accessed
pages (e.g., index pages and data pages accessed through indexes.)

    DJD> This project could be independent of the online backup and could have
    DJD> benfits elsewhere.

I agree.



Reply via email to