>>>>> "RR" == Raymond Raymond <[EMAIL PROTECTED]> writes:
RR> Hi, everyone, I am a graduate student and trying to do some research
RR> with Derby.
RR> I am interested in the "Autonomic checkpointing timing and log file
RR> size" issue on the
RR> to-do list. I would like to know is there anyone else who is
RR> interested in that? or
RR> anyone who can give me some suggestions or direction about that issue?
Currently, you can configure the checkpoint interval and log file size
of Derby by setting the properties:
derby.storage.logSwitchInterval (default 1 MB)
derby.storage.checkpointInterval (default 10 MB)
(None of these seems to be documented in the manuals, and the JavaDoc
for LogFactory.recover() gives wrong (out-dated?) defaults).
This means that by default all log files will be 1 MB, and a checkpoint
is made for every tenth log file.
In order to know when it is useful to change the defaults, one has to
consider the purpose of a checkpoint:
1) Reduced recovery times. Only log create after the penultimate
checkpoint needs to be redone at recovery. This also means that
older log files may be garbage-collected (as long as they do not
contain log records for transactions that are still not
terminated.)
To get short recovery times, one should keep the checkpoint
interval low. The trade-off is that frequent checkpoints will
increase I/O since you will have less updates to the same page
between two checkpoints. Hence, you will get more I/O per
database operation.
2) Flush dirty pages to disk. A checkpoint is a much more efficient
way to clean dirty pages in the db cache than to do it on demand
on a single page when one need to replace it with another.
Hence, one should make sure to do checkpoints often enough to
avoid that the whole cache is dirty.
Based on 2), one could initiate a new checkpoint when many pages in
the cache are dirty (e.g., 50% of the pages) and postpone a checkpoint
if few pages are dirty. The difficult part would be to determine how
long checkpoint intervals is acceptable with respect to impact on
recovery times.
I guess one could argue that for recovery times, it is the clock time
that matters. Hence, one could automatically increase the value of
derby.storage.checkpointInterval on more performant computers since it
will be able to process more log per time unit.
When would want to change the log switch interval? I think few would
care, but since the log files per default are preallocated, space will
be wasted if operations that perform a log switch (e.g., backup) is
performed when the current log file is nearly empty. On the other
hand, a small log file size will result many concurrent log files if
the checkpoint interval is very large.
Hope this helps a little,
--
Øystein