I also agree that page cache enhancement is interesting, but probably
should be tackled as a separate project. But keeping this goal in mind
while making changes for backup is a good thing. An interface that
that allows backup to use/reuse a single buffer in the page cache seems
reasonable. Specializing it would seem to allow some optimizations
where free page searching could be avoided for this operation which at
a very low level is going to be pushing/pulling pages as fast as possible.
I have seen the following ideas work well in a weight based page cache,
it tries to limit the overhead of weights by using multiple lru, but
still have some of the benefit of weight based scheme:
1) have a much smaller range than 0-100, something like 5 where each
value is it's own lru queue. This reduces the overhead of searching
and sorting based on weight.
2) as dan suggests, something like:
no weight: free list
0: backup page, linear scan heap pages, read ahead,
1: probe accessed heap page
2: leaf page
3: non-leaf page
4: root
3) to account for re-reference, pages move up in value when
re-referenced. Revalue happens only when page is accessed so
page is already latched, so limits additional overhead needed
to reweigh page.
various methods can be used for moving down in value:
o whole queues at a time
o individual pages in lru order, based on some sort of clock like
current clock
Øystein Grøvlen wrote:
"DJD" == Daniel John Debrunner <[EMAIL PROTECTED]> writes:
DJD> I think modifications to the cache would be useful for b), so
DJD> that entries in the cache (through generic apis, not specific
DJD> to store) could mark how "useful/valuable" they are. Just a
DJD> simple scheme, lower numbers less valuable, higher numbers
DJD> more valuable, and if it makes it easier to fix a range,
DJD> e.g. 0-100, then that would be ok. Then the store could added
DJD> pages to the cache with this weighting, e.g. (to get the
DJD> general idea)
DJD> pages for backup - weight 0
DJD> overflow column pages - weight 10
DJD> regular pages - weight 20
DJD> leaf index pages - weight 30
DJD> root index pages 80
DJD> This weight would then be factored into the decision to throw pages out
DJD> or not.
I agree that we need some mechanism to prevent operations from filling
the cache with pages that is not likely to be accesssed again in the
near future. However, I am afraid that a very detailed "cost-based"
scheme may create a significant overhead compared to a simple LRU
scheme.
One may operate with separate LRU queues for different weights, but I
guess the number of possible weights will have to be restricted in
that case.
I am also not convinced that it is the type of page that is the most
important criteria for caching. What matters is access frequency.
The page type may give a hint, but leaf pages of one index may be more
frequently accessed than root pages of other indexes.
The type of access is also a relevant criteria. Pages accessed
sequentially is often less likely to be accessed again in the near
future than pages accessed by direct lookup. A separate LRU queue for
sequentially accessed pages may prevent backup and other sequentially
scans (e.g., select * from t) from throwing out directly accessed
pages (e.g., index pages and data pages accessed through indexes.)
DJD> This project could be independent of the online backup and could have
DJD> benfits elsewhere.
I agree.