Mike Matrigali wrote:


Øystein Grøvlen wrote:


Mike Matrigali wrote:

I think my main issue is that I don't see that it is important to
optimize writing the cached dirty data.  Especially since the order
that you are proposing writing the dirty data is exactly the wrong
order to the current cache performance goal to minimize the number of total I/O's the
system is going to do (a page that is the oldest written exists in
a busy cache most likely because it has been written many times -
otherwise the standard background I/O thread would have written
it already).



I think your logic is flawed if you are talking about checkpointing (and not the background writer). If you want to guarantee a certain recovery time, you will need to write the oldest page. Otherwise, you will not be able to advance the starting point for recovery. This approach to checkpointing should reduce the number of I/Os since you are not writing a busy page until it is absolutely necessary. The current checkpointing writes a lot of pages which does not do anything to make it possible to garbage-collect log. Those pages should be left to the background writer, which can use its own criteria for which pages are optimal to write.


I guess I was not clear, I agree with you:
checkpoint - wants to write oldest page, I agree this is necessary to move the redo low water mark.
    background - wants to write least used, probably not oldest page.

What pages are you talking about that the current checkpoint process writes that are not necessary. Are they the ones that go from clean to dirty after the checkpoint starts? It seems that in current checkpoint all pages dirty at the start are necessary to move the redo
low water mark.

It is not necessary to write all pages to be able to move the redo low watermark forward. It is necessary to write all pages to move the redo low water all the way up to the new checkpoint log record. However, that will probably give a much lower recovery time than what we are aiming for. Hence, we can skip writing the newer pages and still be within the requested recovery time.


...


I think we SHOULD sync for every I/O, but not the way we do today. By opening the files with "rwd", we should be able to do this pretty efficiently already today. (At least on some systems. I am not sure about non-POSIX systems like windows.) Syncing for every I/O gives us much more control over the I/O, and we will not be vulnerable to queuing effects that we do not control.


Do you think we should sync for every I/O in the non-checkpoint case also. The case I am most interested in, is where a user transaction
needs to wait for a page in the cache and the only way to give that
page is by writing another page in the cache out.  Currently this write
is async, are you proposing to change this to a sync write?


This scenario should be very rare. If is not rare, async writing will probably just lead you into trouble over time since you will allow user threads to proceed at a rate that the file system will not be able to sustain in the long run. Also, see my reply to Suresh where I discuss a way this could be handle so it is still async with respect to user threads.


--
Øystein

Reply via email to