Øystein Grøvlen wrote:
(Any reason this did not go to derby-dev?)


"MM" == Mike Matrigali <[EMAIL PROTECTED]> writes:


    MM> Your change to checkpoint seems like a low risk, and from your tests
    MM> high benefit.  My only worry is those systems with a bad implementation
    MM> of "sync" which is linearly related to size of file or size of OS disk
    MM> cache (basically I have seen implementations where the OS does not have
    MM> a data structure to track dirty pages associated with a file so it has
    MM> two choices: 1) search every page in the disk cache or probe in the disk
    MM> cache for every page in the file - it chooses which approach to use
    MM> based on file size vs cache size).  I was willing to pay the cost of one
    MM> of these calls per big file, but I think would lean toward just using
    MM> sync write for checkpoint given the problems you are seeing, but not
    MM> very strongly.  With reasonable
    MM> implementations of file sync I like your approach.

    MM> If you go with syncing every 100, I wonder if it might make sense to
    MM> "slow" checkpoint even more in a busy system.  Since the writes are
    MM> not really doing I/O maybe it might make sense to give other threads
    MM> in the system a chance more often at an I/O slot by throwing in a
    MM> give up my time slice call every N writes with N being a relatively
    MM> small number like 1-5.

Maybe I should try to see what happens if I just makes the checkpoint
sleep for a few seconds every N writes instead of doing a sync.  It
could be that the positive effect is mainly from slowing down the
checkpoint when the I/O system is overloaded.

Yes, that would be interesting. If that helps then I think there are better things than sleep, but not worth coding if sleep doesn't help.

Do you think your system will see the same issues if running in durability=test mode (ie. no syncs). Someday I would like to produce a non-sync system which would guarantee consistent db recovery (just might lose transactions but not half of a transaction), so it would
be interesting to understand if the problem is the sync or the problem
is just the blast of unsynced writes.


...

    MM> What is your log rate (bytes/sec to the log).  I think you are just
    MM> saying that the default of a checkpoint per 10 meg of log is a bad
    MM> default for these kinds of apps.

The first checkpoint occurs after about 5 minutes. I guess that should
indicate a log rate of 300 kbytes/sec.

I do not think changing the checkpoint interval would help much on the
high response times unless you make it very short so that the number
of pages per checkpoint is much smaller.
ok, that is not too bad - I was worried that you were generating a checkpoint every few seconds. Though in such an application I might
set the checkpoint rate to be more like once per hour.  Again this is
a separate issue, no matter what the checkpoint rate it should still be
fixed to avoid the hits you are seeing.


Reply via email to