Re: Discussion of incremental checkpointing----Added some new content

Øystein Grøvlen Tue, 14 Feb 2006 06:41:48 -0800

Mike Matrigali wrote:

Øystein Grøvlen wrote:
Mike Matrigali wrote:
I think my main issue is that I don't see that it is important to
optimize writing the cached dirty data.  Especially since the order
that you are proposing writing the dirty data is exactly the wrong
order to the current cache performance goal to minimize the number oftotal I/O's the
system is going to do (a page that is the oldest written exists in
a busy cache most likely because it has been written many times -
otherwise the standard background I/O thread would have written
it already).
I think your logic is flawed if you are talking about checkpointing(and not the background writer). If you want to guarantee a certainrecovery time, you will need to write the oldest page. Otherwise, youwill not be able to advance the starting point for recovery. Thisapproach to checkpointing should reduce the number of I/Os since youare not writing a busy page until it is absolutely necessary. Thecurrent checkpointing writes a lot of pages which does not do anythingto make it possible to garbage-collect log. Those pages should be leftto the background writer, which can use its own criteria for whichpages are optimal to write.
I guess I was not clear, I agree with you:
checkpoint - wants to write oldest page, I agree this is necessaryto move the redo low water mark.
    background - wants to write least used, probably not oldest page.
What pages are you talking about that the current checkpoint processwrites that are not necessary. Are they the ones that go from cleanto dirty after the checkpoint starts? It seems that in currentcheckpoint all pages dirty at the start are necessary to move the redo
low water mark.

It is not necessary to write all pages to be able to move the redo lowwatermark forward. It is necessary to write all pages to move the redolow water all the way up to the new checkpoint log record. However,that will probably give a much lower recovery time than what we areaiming for. Hence, we can skip writing the newer pages and still bewithin the requested recovery time.

...

I think we SHOULD sync for every I/O, but not the way we do today. Byopening the files with "rwd", we should be able to do this prettyefficiently already today. (At least on some systems. I am not sureabout non-POSIX systems like windows.) Syncing for every I/O gives usmuch more control over the I/O, and we will not be vulnerable toqueuing effects that we do not control.
Do you think we should sync for every I/O in the non-checkpoint casealso. The case I am most interested in, is where a user transaction
needs to wait for a page in the cache and the only way to give that
page is by writing another page in the cache out.  Currently this write
is async, are you proposing to change this to a sync write?

This scenario should be very rare. If is not rare, async writing willprobably just lead you into trouble over time since you will allow userthreads to proceed at a rate that the file system will not be able tosustain in the long run. Also, see my reply to Suresh where I discuss away this could be handle so it is still async with respect to user threads.



--
Øystein

Re: Discussion of incremental checkpointing----Added some new content

Reply via email to