Raymond Raymond wrote:
From: Mike Matrigali <[EMAIL PROTECTED]>
Reply-To: "Derby Development" <[email protected]>
To: Derby Development <[email protected]>
Subject: Re: Can anyone give me some suggestions?
Date: Tue, 23 Aug 2005 10:03:35 -0700

Yes, the key is that in normal operation the call to these routines
is always done by a background process which is a different thread
than normal client threads.  So database client threads procede
while the checkpoint procedes.  The buffer cache is not locked for
the duration of the checkpoint, only individual pages for the time
it takes to copy the page from the buffer cache into the OS cache.



Thanks lots, I got what you said. But I still have some questions about that.

1. If a buffer page, we call it B1 here, is updated before a checkpoint is taken and the log of the update is generated as L1. During the following checkpoint, depending on the WAL policy, log buffer will be first forced to disk, so L1 will be flushed to disk. Then the checkpoint process will try to write the cache buffer to disk. My question is, since the checkpoint thread and the database client threads are asynchronous, it is possible another update on B1 happens before B1 is latched and written out, later B1 will be written out but the log of the second update is not
written out yet. How does derby ensure the WAL policy in this case?

Derby maintains in memory the last log record that changed the page. Before writing the page it always asks the log to flush up to that page,
usually this request is a no-op as the log has already flushed up to
that point.  See CachedPage.java!writePage()
  // force WAL - and check to see if database is corrupt or is frozen.
// last log Instant may be null if the page is being forced
// to disk on a createPage (which violates the WAL protocol actually).
// See FileContainer.newPage
LogInstant flushLogTo = getLastLogInstant();
dataFactory.flush(flushLogTo);

2. During a checkpoint, now derby will search all the cache buffer for dirty pages and write them out. Why don't we keep a dirty page list? Just IDs that can identify the corresponding dirty page are stored in the list, so it will not take a lot of space. The first time a buffer page is updated, its ID will be appended to the list, and after the dirty page has been written out, it will be released from the list. During a checkpoint, we just search from the head of the list to the end and write
the corresponding dirty pages out.

The current cache has no lists, lists tend to be a point of contention in between users of the cache. The current cache design was mostly picked to be correct and to be simple as possible. Walking an array once for the checkpoint does not seem like much overhead, over maintaining dirty page list. I guess if the usual case of a checkpoint there were no dirty pages that would be an issue, but at least the current checkpoints are driven by amount of log being written which in
most cases will indicate lots of dirty pages.

 The tradeoff here is that performance of
a checkpoint is not that important, especially making it go faster in
anyway slows down the "real" work of the system.  If I did any work on
checkpoints I would actually try to figure out how to make it slower, ie. spread the I/O out over non-active times in the server - so that it would have little affect on user queries. It is not as simple as priority as what is mostly affected is I/O load of the system rather than cpu.



Raymond

_________________________________________________________________
Powerful Parental Controls Let your child discover the best the Internet has to offer. http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines Start enjoying all the benefits of MSNĀ® Premium right now and get the first two months FREE*.




Reply via email to