Re: Discussion of how to map the recovery time into Xmb of log --Checkpoint issue

Raymond Raymond Sat, 21 Jan 2006 14:27:47 -0800

Mike, thank you for you comments. They really help me a lot. I would
like to make more discussion on the issue.


RR    2. During initilization of Derby, we run some measurement that
RR       determines the performance of the system and maps the
RR       recovery time into some X megabytes of log.
MM  What do you mean by initialization?  Once per boot, once per db
MM  creation, something else?

Initialization here means once per boot

RR    3. establish a dirty page list in wich dirty pages are sorted in
RR  ascending
RR        order of the time when they were firt updated. When one dirty
RR  page
RR        is flushed out to disk, it will be released from the
RR  link.(this step needs
RR        further discussion,whether we need to establish such a list)
MM  I am not convinced of a need for such a list, and do not want to see
MM  such a list slow down non-checkpoint related activity.  From other
MM  reported Derby issues it is clear we actually want to "slow" the
MM  checkpoint down rather than optimize it.

Actually the list is not designed to speed up the checkpoint process. It
is for the incremental checkpoint described in step 5. With this list, we
can guarantee that the oldest dirty pages are written out. We can think that
the incrementl checkpoint divide the courrent checkpoint process in to
several pieces, and try to make every piece of checkpoint as an intact
checkpoint process. In another word, the incremental checkpoint process
never stops, it just use the time, when the system is not so busy, to do
piece of checkpoint. If we search the entire cache space to write out dirty

page as what we do now, we can't guarantee the oldest dirty pages arewritten

out and the redoLWM may be much older than the the last checkpoint mark.

MM  The downside with the
MM  current
MM  algorithm is that a page that is made dirty after the checkpoint
MM  starts
MM  will be written, and if it gets written again before the next
MM  checkpoint
MM  we have done 1 too many I/O's.  I think I/O optimization may benefit
MM  more by working on optimizing the background I/O thread than working
MM  on the checkpoint.


If the background I/O thread can refer to this list.I think it can help
solve the problem you mentioned. I am not very familiar with the background
I/O thread. If I am wrong, please point it out.
In the list, the dirt pages are sorted in ascending order of the time when
they were firt updated, which means the oldest dirty page is in the head of
the list and the latest updated dirty page is in the end of the list.
The operations on the list are :
- When a page is updated and it is not in the list, we will append it to
 the end of the list.
- When a dirty page in the list is written out to disk, it will be released
 from the list.

Let's look into your problem:
  if a page is made dirty after the checkpoint starts,

1) if the page was made dirty before this update, it was supposed to be
  in the list already.We don't need to add it again.
  When the checkpoint process writes this dirty page out to disk, it will
  be released from the list, and if the background I/O thread refer to
  the list, it will know it's no need to write this page out again.
2) if the page was first time updated. It will be appended to the end
  of the list.If the background I/O thread refer to the list, it knows
  it's no need to write this page out so soon since it has just been
  updated.

RR    4. A checkpoint is made and controled in combined consideration
RR  of
RR        -the acceptable log length which we get in step 2
RR        -the current IO performance
RR    5. We do increamental checkpoint.That means:
RR        From the beginning of the dirty page list established in step
RR  3,(the
RR        earliest updated dirty page), to the end of the list (the
RR  latest updated
RR        dirty page), we do checkpoint. If data reads or a log writes
RR  (if log in
RR        default location) start to have longer response times then a
RR  appropriate
RR        value,we pause the checkpoint process and update the log
RR  control file to let
RR        derby know where we are.When the data reads or log writes
RR  time return to
RR        acceptable value, we continue to do checkpoint.
MM  This sounds like you are looking to address DERBY-799.  I though
MM  Oystein
MM  was going to work on this, but there is no owner so I am not sure.
MM  You
MM  may at least want to consult with him on his findings.

   I wrote a mail to Oystein and hope he would give me some comments.

RR  This is just an outline. I would like to discuss details about them
RR  with everyone
RR  later.If anyone has any suggestion, please let me know.
RR
RR  Now, I am going to design the 2nd step first to map the recovery
RR  time into some
RR  X megabytes of log. A simple approach is that we can design a test
RR  log file. In the
RR  log file, we can let derby create a temporary database and do a
RR  bunch of test to get
RR  necessary disk IO information, and then delete the temporary
RR  database. When derby
RR  boots up, we let it to do recovery from the test log file.Anyone
RR  has some other
RR  suggestions on it?
MM  I'll think about this, it is not straight forward.  My guess would
MM  be that recovery time is dominated by 2 factors:
MM  1) I/O from log disk
MM  2) I/O from data disk
MM
MM  Item 1 is pretty easy to get a handle on.  During redo it is pretty
MM  much
MM  a straight scan from beginning to end doing page based I/O.  Undo is
MM  harder as it jumps back and forth for each xact.  I would probably
MM  just
MM  ignore it for estimates.
MM
MM  Item 2 is totally dependent on cache rate hit you are going to
MM  expect, and the number of log records.
MM  The majority of log records deal with a single page, it will read
MM  the page into cache if it doesn't exist and then it will do a quick
MM  operation on that page.  Again undo is slightly more complicated as
MM  it
MM  could involve logical lookups in the index.
MM
MM  Another option rather than do any sort of testing is to come up with
MM  an
MM  initial default time based on size of log file.  And then on each
MM  subsequent recovery event dynamically change the estimate based on
MM  how
MM  long that recovery on that db took.  This way each estimate will be
MM  based on the actual work generated by the application, and over time
MM  should become better and better.

Thank you for your suggestion. I will think about it more carefully and
discuss it later.

RR
RR  I am wondering do I need to establish some relationship between the
RR  data reads time
RR  and the data writes time. I mean, under a certain average data
RR  reads time, approximately
RR  how long would the average data writes time be.Since what we get
RR  from step2 is jusn
RR  under a certain system condition,when the system condition
RR  changes(becomes busier),
RR  the value should change too. If I can establish such a
RR  relationship,then I can make
RR  acurate adjustment on the checkpoint process.
RR
RR
RR
RR
RR  Raymond
RR

_________________________________________________________________

Take advantage of powerful junk e-mail filters built on patented Microsoft®SmartScreen Technology.http://join.msn.com/?pgmarket=en-ca&page=byoa/prem&xAPID=1994&DI=1034&SU=http://hotmail.com/enca&HL=Market_MSNIS_TaglinesStart enjoying all the benefits of MSN® Premium right now and get thefirst two months FREE*.

Re: Discussion of how to map the recovery time into Xmb of log --Checkpoint issue

Reply via email to