Re: Discussion of incremental checkpointing----Added some new content

Mike Matrigali Wed, 08 Feb 2006 10:23:01 -0800

I think my main issue is that I don't see that it is important to
optimize writing the cached dirty data.  Especially since the order
that you are proposing writing the dirty data is exactly the wrong

order to the current cache performance goal to minimize the number oftotal I/O's the

system is going to do (a page that is the oldest written exists in
a busy cache most likely because it has been written many times -
otherwise the standard background I/O thread would have written
it already).


If we knew derby was the only process on the machine then an approach
as you suggest might be reasonable, ie. we own all resources so we
should max out the use of all those resources.  But as a zero admin
embedded db I think derby should be more conservative in it's
resource usage.

I agree that your incremental approach optimizes recovery time, I
just don't think that any runtime performance hit is worth it (or even
extra complication of the checkpoint algorithms at no runtime cost).  The
system should, as you propose, attempt to guarantee a maximum
recovery time - but I see no need to work hard (ie. use extra
resources) to guarantee better than that.  Recovery is an edge case,
it should not be optimized for.

Also note that the current checkpoint does 2 operations to insure
each page is on disk, you can not assume the page has hit disk
until both are complete.  It first uses java write (which is async
by default), and then it forces the entire file.  The second step
is a big overhead on some systems so is not appropriate to do
for each write (where the overhead is cpu linear to the size of file
rather than the number of dirty pages).  This has been discussed
previously on the list.  As has been pointed out the most efficient
method of writing out a number of pages is to somehow queue a small
number of writes async, and then wait for all to finish before
queueing the next set.  Unfortunately standard OS mechanisms to do
this don't exist yet in JAVA, they are being proposed in some new
JSR's.  I have been waiting for patches from others, but if one doesn't

come I will change the current checkpoint before the next release toqueue small number of

writes and then wait for the estimated time of executing those writes,
and then continue to queue more writes.  This should solve 90% of
the checkpoint I/O flood issue.

Raymond Raymond wrote:

Mike,
I am sorry, I did not make it very clear what I am going to do.
I will explain it now. I think the last thing you mentioned is what
I want to solve :

MM  how long it takes to do a checkpoint vs. recovery time
MM  requirements of the applications?

I am working on the issue of automatic checkpointing which makes
the derby engine control the checkpointing process by itself depends
on the runtime situation.  The goal is to make a good balance between
runtime resource consumption of the checkpointing process (especially
disk I/O resource), and the recovery time. I want to do checkpointing as
much as possible while have less interference with the real work of derby.
(Most of the system resource I mentioned here is the disk I/O resource,
since as for the checkpointing issue, disk I/O resource is the bottleneck
over all the system resources)
 Let's look into the current derby checkpointing mechanism first. If we
set the checkpointing interval too short, derby will do checkpointing very
often, which will take lots of disk I/O resource, and the reponses to the

requests from clients will be delayed. Conversely, if we set thecheckpointing

interval too long, derby will keep lots of data in cache and when crush
happens, the recovery time will be very long. I am trying to make a good
balance between them.
 Then, let me show the benefits of my proposal. My basic idea is to do
checkpoint as much as possible when disk I/O is not busy. I think the
incremental checkpointing mechanism combined with consideration of the
runtime situation( what we discussed last time -- "Discussion of how to map
the recovery time into Xmb of log", about statistic of system performance

information, time of a data reads or time of a log writes, etc.) cansolve the

problem.
 We can imagine that the incremental checkpointing mechanism

divides the current checkpoint process into several pieces. We let theuser

to setup an acceptable recovery time.Then, when the system is not busy,
we do a piece of checkpoint; when the system becomes busy,we suspend

the checkpoint process for a while, and so on. But if the log is longerthen

the acceptable length, we try to do checkpoint even the system is busy to
meet the acceptable recovery time set by the user. We make each piece of

checkpoint an intact checkpointing process by updating the log controlfile.

I think what you described in your comments is what the incremental
checkpointing will do:

MM  I think what you want is at step 2 to somehow write multiple

MM checkpoint log records rather than wait to the end. Let's assumeprevious REDO

MM  MARK was LOGEND_PREV.  So while writing the pages you would write a new
MM  type of checkpoint record that would move the REDO mark up to somewhere
MM  between LOGEND_PREV and current end of log file.  I think I agree that
MM  if you wrote a checkpoint record for every I/O from your ordered list
MM  then you would guarantee minimum redo recovery time, where the current
MM  system writes one log record at end of all I/O's which at the end of
MM  writing all page would match your approach (a little hard to compare
MM  as your approach is continuous but if you just compare the dirty page
MM  list at LOGEND I think this is true).

To establish a dirty page list is just my suggestion. The purpose ofthat is to

sort the dirty pages in ascending order of the time when they were first
updated. If we can have any other ideas to do that without using extra
memory,we don¡¯t have to establish such a list.

Anyone has any suggestion on it? Everyone is welcome to give your opinion
on it.



Raymond

From: Mike Matrigali <[EMAIL PROTECTED]>

I am hoping maybe someone else will get in this discussion, I
am not seeing the benefit vs runtime cost of the incremental checkpoint
approach.
It could be I am blinded by the systems that I have worked on.  I
just have this gut reaction to adding another queue to the I/O
system, going foward I would rather see Derby more parallel and
a single queue seems to be the wrong direction.

Again what problem are you trying to solve? My guesses:
1) avoid checkpoint I/O saturation (I think this needs to be done but
  this can be done in current checkpoint scheme).
2) reduce number of redo recovery log records (how important is this,
  at cost to runtime list manipulation)
3) some other problem?

I think you are trying to solve 2 - which I admit I don't see as much
of a problem.  Currently at a high level (ignoring details) we do:
1) start checkpoint, note current end of log file (LOGEND)
2) we should slowly write pages all dirty pages in cache (I agree we
   need a fix in this area to current scheme)
3) when done write checkpoint log record indicating REDO mark at LOGEND,
   now log may be at LOGEND + N

I think what you want is at step 2 to somehow write multiple checkpoint
log records rather than wait to the end.  Let's assume previous REDO
MARK was LOGEND_PREV.  So while writing the pages you would write a new
type of checkpoint record that would move the REDO mark up to somewhere
between LOGEND_PREV and current end of log file.  I think I agree that
if you wrote a checkpoint record for every I/O from your ordered list
then you would guarantee minimum redo recovery time, where the current
system writes one log record at end of all I/O's which at the end of
writing all page would match your approach (a little hard to compare
as your approach is continuous but if you just compare the dirty page
list at LOGEND I think this is true).

So again, let's first talk about what you want to solve rather than
how to solve it.  Maybe you have some assumptions about what type of
system you are trying to address, like:  size of cache, percentage of
dirty pages, how long it takes to do a checkpoint vs. recovery time
requirements of the applications?

Raymond Raymond wrote:

>
>> From: Mike Matrigali <[EMAIL PROTECTED]>
>>
>> I think this is the right path, though would need more details:
>> o does boot mean first time boot for each db?
>> o how to determine "this machine"
>> o and the total time to run such a test.
>>
>> There are some very quick and useful tests that would be fine to
>> add to the default system and do one time per database    Measureing
>> how long to do a commit and how long to do a single database read from
>> disk would be fine.  Seems like
>> just these 2 numbers could be used to come up with a very good
>> default estimate of log recovery time per log record.  Then as you
>> propose the actual estimate can be improved by meauring real
>> recovery time in the future.
>>
>> I am not convinced of the need for the bigger test, but if the default
>> is not to run it automatically and it is your "itch" to have such
>> a configuration option then I would not oppose.  I do see great value

>> in coming up with a very good default estimate of recovery timeestimate

>> based on outstanding number of log records.  And
>> I even envision

>> a framework in the future where derby would schedule othernon-essential

>> background tasks that have been discussed in the
>>
>> On a different track I am still unclear on the checkpoint dirty page
>> lru list.  Rather than talk about implementation details, I would
>> like to understand the problem you are trying to solve.  For instance
>> I well understand the goal to configure checkpoints such that they
>> map to user understandable concept of the tradeoff of current runtime
>> performance vs. how long am I willing to wait the next time I boot
>> the database after a crash.
>>
>> What is the other problem you are looking at.
>>
>
>
> Mike:
>
> What I am looking at next is to redesign the checkpointing process.
> The current checkpointing mechanism will write out all the dirty pages

> during the checkpoint. That causes a burst of disk I/O. Lots ofproblems

> were mentioned by some people, such the DERBY-799 reported by Oystein.
> I have a proposal of incremental checkpointing. I have mentioned it
> before,I would like to explain it in more detail.
>
> We should find some way to sort the dirty pages in ascending order of
> the time when they were firt updated.The incremental checkpointing
> process will continually write out the dirty pages from the earliest
> updated dirty page to the latest updated dirty page. The writing rate
> is related to the system situation.
> There are two situations in which we will update the log control file:
> 1)A data reads or a log writes start to have a longer response time
> then an acceptable value, we update the log control and sleep for a
> while.
> 2)After writing out a certain number of dirty pages
>
> The benefits of it are :
> 1)since we wirte out dirty pages from the earliest updated page to the
> latest updated page, the checkpoint instance will keep advancing.Since
> the incremental checkpoint is performed continuously, the checkpoint

> instance will be much closer to the tail of the log than theconventional

> checkpointing.

> 2)the checkpointing process can be paused if the disk I/O becomesreally

> busy, and the finished part is an intact checkpoint instance.
>
> Do you still remember I suggested to establish a establish a dirty page
> list in wich dirty pages are sorted in ascending order of the time when
> they were firt updated? I would like to discuss on it again.
>

> Actually the list is not designed to speed up the checkpointprocess. It> is for the incremental checkpointing described above.To make thecheckpoint> instance keep advancing, We should guarantee the earlier updatedpages have

> been written out.That's why I suggested to establish such a list.
>
> In the last disucssion, you also mentioned a problem:
>
> MM  The downside with the
> MM  current
> MM  algorithm is that a page that is made dirty after the checkpoint
> MM  starts
> MM  will be written, and if it gets written again before the next
> MM  checkpoint

> MM we have done 1 too many I/O's. I think I/O optimization maybenefit> MM more by working on optimizing the background I/O thread thanworking

> MM  on the checkpoint.
>
>
> If the background I/O thread can refer to this list.I think it can help

> solve the problem you mentioned. I am not very familiar with thebackground

> I/O thread. If I am wrong, please point it out.
>

> In the list, the dirt pages are sorted in ascending order of thetime when> they were firt updated, which means the oldest dirty page is in thehead of

> the list and the latest updated dirty page is in the end of the list.
> The operations on the list are :

> - When a page is updated and it is not in the list, we will appendit to

> the end of the list.

> - When a dirty page in the list is written out to disk, it will bereleased

> from the list.
>
> Let's look into your problem:
>  if a page is made dirty after the checkpoint starts,
>
> 1) if the page was made dirty before this update, it was supposed to be
>  in the list already.We don't need to add it again.

> When the checkpoint process writes this dirty page out to disk, itwill

>  be released from the list, and if the background I/O thread refer to
>  the list, it will know it's no need to write this page out again.
> 2) if the page was first time updated. It will be appended to the end
>  of the list.If the background I/O thread refer to the list, it knows
>  it's no need to write this page out so soon since it has just been
>  updated.
>
>
> Is it resonable?
>
>
> Raymond
>
> _________________________________________________________________
> Take advantage of powerful junk e-mail filters built on patented
> Microsoft?SmartScreen Technology.

>http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_Taglines

>  Start enjoying all the benefits of MSN?Premium right now and get the
> first two months FREE*.
>
>



_________________________________________________________________

Don't just Search. Find! http://search.sympatico.msn.ca/default.aspx Thenew MSN Search! Check it out!

Re: Discussion of incremental checkpointing----Added some new content

Reply via email to