Mike,
I am sorry, I did not make it very clear what I am going to do.
I will explain it now. I think the last thing you mentioned is what
I want to solve :
MM how long it takes to do a checkpoint vs. recovery time
MM requirements of the applications?
I am working on the issue of automatic checkpointing which makes
the derby engine control the checkpointing process by itself depends
on the runtime situation. The goal is to make a good balance between
runtime resource consumption of the checkpointing process (especially
disk I/O resource), and the recovery time. I want to do checkpointing as
much as possible while have less interference with the real work of derby.
(Most of the system resource I mentioned here is the disk I/O resource,
since as for the checkpointing issue, disk I/O resource is the bottleneck
over all the system resources)
Let's look into the current derby checkpointing mechanism first. If we
set the checkpointing interval too short, derby will do checkpointing very
often, which will take lots of disk I/O resource, and the reponses to the
requests from clients will be delayed. Conversely, if we set the
checkpointing
interval too long, derby will keep lots of data in cache and when crush
happens, the recovery time will be very long. I am trying to make a good
balance between them.
Then, let me show the benefits of my proposal. My basic idea is to do
checkpoint as much as possible when disk I/O is not busy. I think the
incremental checkpointing mechanism combined with consideration of the
runtime situation( what we discussed last time -- "Discussion of how to map
the recovery time into Xmb of log", about statistic of system performance
information, time of a data reads or time of a log writes, etc.) can
solve the
problem.
We can imagine that the incremental checkpointing mechanism
divides the current checkpoint process into several pieces. We let the
user
to setup an acceptable recovery time.Then, when the system is not busy,
we do a piece of checkpoint; when the system becomes busy,we suspend
the checkpoint process for a while, and so on. But if the log is longer
then
the acceptable length, we try to do checkpoint even the system is busy to
meet the acceptable recovery time set by the user. We make each piece of
checkpoint an intact checkpointing process by updating the log control
file.
I think what you described in your comments is what the incremental
checkpointing will do:
MM I think what you want is at step 2 to somehow write multiple
MM checkpoint log records rather than wait to the end. Let's assume
previous REDO
MM MARK was LOGEND_PREV. So while writing the pages you would write a new
MM type of checkpoint record that would move the REDO mark up to somewhere
MM between LOGEND_PREV and current end of log file. I think I agree that
MM if you wrote a checkpoint record for every I/O from your ordered list
MM then you would guarantee minimum redo recovery time, where the current
MM system writes one log record at end of all I/O's which at the end of
MM writing all page would match your approach (a little hard to compare
MM as your approach is continuous but if you just compare the dirty page
MM list at LOGEND I think this is true).
To establish a dirty page list is just my suggestion. The purpose of
that is to
sort the dirty pages in ascending order of the time when they were first
updated. If we can have any other ideas to do that without using extra
memory,we don¡¯t have to establish such a list.
Anyone has any suggestion on it? Everyone is welcome to give your opinion
on it.
Raymond
From: Mike Matrigali <[EMAIL PROTECTED]>
I am hoping maybe someone else will get in this discussion, I
am not seeing the benefit vs runtime cost of the incremental checkpoint
approach.
It could be I am blinded by the systems that I have worked on. I
just have this gut reaction to adding another queue to the I/O
system, going foward I would rather see Derby more parallel and
a single queue seems to be the wrong direction.
Again what problem are you trying to solve? My guesses:
1) avoid checkpoint I/O saturation (I think this needs to be done but
this can be done in current checkpoint scheme).
2) reduce number of redo recovery log records (how important is this,
at cost to runtime list manipulation)
3) some other problem?
I think you are trying to solve 2 - which I admit I don't see as much
of a problem. Currently at a high level (ignoring details) we do:
1) start checkpoint, note current end of log file (LOGEND)
2) we should slowly write pages all dirty pages in cache (I agree we
need a fix in this area to current scheme)
3) when done write checkpoint log record indicating REDO mark at LOGEND,
now log may be at LOGEND + N
I think what you want is at step 2 to somehow write multiple checkpoint
log records rather than wait to the end. Let's assume previous REDO
MARK was LOGEND_PREV. So while writing the pages you would write a new
type of checkpoint record that would move the REDO mark up to somewhere
between LOGEND_PREV and current end of log file. I think I agree that
if you wrote a checkpoint record for every I/O from your ordered list
then you would guarantee minimum redo recovery time, where the current
system writes one log record at end of all I/O's which at the end of
writing all page would match your approach (a little hard to compare
as your approach is continuous but if you just compare the dirty page
list at LOGEND I think this is true).
So again, let's first talk about what you want to solve rather than
how to solve it. Maybe you have some assumptions about what type of
system you are trying to address, like: size of cache, percentage of
dirty pages, how long it takes to do a checkpoint vs. recovery time
requirements of the applications?
Raymond Raymond wrote:
>
>> From: Mike Matrigali <[EMAIL PROTECTED]>
>>
>> I think this is the right path, though would need more details:
>> o does boot mean first time boot for each db?
>> o how to determine "this machine"
>> o and the total time to run such a test.
>>
>> There are some very quick and useful tests that would be fine to
>> add to the default system and do one time per database Measureing
>> how long to do a commit and how long to do a single database read from
>> disk would be fine. Seems like
>> just these 2 numbers could be used to come up with a very good
>> default estimate of log recovery time per log record. Then as you
>> propose the actual estimate can be improved by meauring real
>> recovery time in the future.
>>
>> I am not convinced of the need for the bigger test, but if the default
>> is not to run it automatically and it is your "itch" to have such
>> a configuration option then I would not oppose. I do see great value
>> in coming up with a very good default estimate of recovery time
estimate
>> based on outstanding number of log records. And
>> I even envision
>> a framework in the future where derby would schedule other
non-essential
>> background tasks that have been discussed in the
>>
>> On a different track I am still unclear on the checkpoint dirty page
>> lru list. Rather than talk about implementation details, I would
>> like to understand the problem you are trying to solve. For instance
>> I well understand the goal to configure checkpoints such that they
>> map to user understandable concept of the tradeoff of current runtime
>> performance vs. how long am I willing to wait the next time I boot
>> the database after a crash.
>>
>> What is the other problem you are looking at.
>>
>
>
> Mike:
>
> What I am looking at next is to redesign the checkpointing process.
> The current checkpointing mechanism will write out all the dirty pages
> during the checkpoint. That causes a burst of disk I/O. Lots of
problems
> were mentioned by some people, such the DERBY-799 reported by Oystein.
> I have a proposal of incremental checkpointing. I have mentioned it
> before,I would like to explain it in more detail.
>
> We should find some way to sort the dirty pages in ascending order of
> the time when they were firt updated.The incremental checkpointing
> process will continually write out the dirty pages from the earliest
> updated dirty page to the latest updated dirty page. The writing rate
> is related to the system situation.
> There are two situations in which we will update the log control file:
> 1)A data reads or a log writes start to have a longer response time
> then an acceptable value, we update the log control and sleep for a
> while.
> 2)After writing out a certain number of dirty pages
>
> The benefits of it are :
> 1)since we wirte out dirty pages from the earliest updated page to the
> latest updated page, the checkpoint instance will keep advancing.Since
> the incremental checkpoint is performed continuously, the checkpoint
> instance will be much closer to the tail of the log than the
conventional
> checkpointing.
> 2)the checkpointing process can be paused if the disk I/O becomes
really
> busy, and the finished part is an intact checkpoint instance.
>
> Do you still remember I suggested to establish a establish a dirty page
> list in wich dirty pages are sorted in ascending order of the time when
> they were firt updated? I would like to discuss on it again.
>
> Actually the list is not designed to speed up the checkpoint
process. It
> is for the incremental checkpointing described above.To make the
checkpoint
> instance keep advancing, We should guarantee the earlier updated
pages have
> been written out.That's why I suggested to establish such a list.
>
> In the last disucssion, you also mentioned a problem:
>
> MM The downside with the
> MM current
> MM algorithm is that a page that is made dirty after the checkpoint
> MM starts
> MM will be written, and if it gets written again before the next
> MM checkpoint
> MM we have done 1 too many I/O's. I think I/O optimization may
benefit
> MM more by working on optimizing the background I/O thread than
working
> MM on the checkpoint.
>
>
> If the background I/O thread can refer to this list.I think it can help
> solve the problem you mentioned. I am not very familiar with the
background
> I/O thread. If I am wrong, please point it out.
>
> In the list, the dirt pages are sorted in ascending order of the
time when
> they were firt updated, which means the oldest dirty page is in the
head of
> the list and the latest updated dirty page is in the end of the list.
> The operations on the list are :
> - When a page is updated and it is not in the list, we will append
it to
> the end of the list.
> - When a dirty page in the list is written out to disk, it will be
released
> from the list.
>
> Let's look into your problem:
> if a page is made dirty after the checkpoint starts,
>
> 1) if the page was made dirty before this update, it was supposed to be
> in the list already.We don't need to add it again.
> When the checkpoint process writes this dirty page out to disk, it
will
> be released from the list, and if the background I/O thread refer to
> the list, it will know it's no need to write this page out again.
> 2) if the page was first time updated. It will be appended to the end
> of the list.If the background I/O thread refer to the list, it knows
> it's no need to write this page out so soon since it has just been
> updated.
>
>
> Is it resonable?
>
>
> Raymond
>
> _________________________________________________________________
> Take advantage of powerful junk e-mail filters built on patented
> Microsoft?SmartScreen Technology.
>
http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines
> Start enjoying all the benefits of MSN?Premium right now and get the
> first two months FREE*.
>
>
_________________________________________________________________
Don't just Search. Find! http://search.sympatico.msn.ca/default.aspx The
new MSN Search! Check it out!