Mike Matrigali wrote:
Long answer above, some comments inline below. I think runtime performance would be optimal in this case, runtime performance is in no way "helped" by having checkpoints - only either not affected or hindered. As has been noted checkpoints can cause drastic downward spikes in some disk bound applications, hopefully we will some changes into 10.2 to smooth those spikes down. But the reality is the more checkpoints on a system that is disk i/o bound the more the app is going to slow down, if you are not disk i/o bound then the checkpoints may have little affect. Thank you for the explanations Mike. I run a TPC-B like load against Derby and plotted some performance metrics for two different configurations; one where the default checkpointing interval was used, and one where it was set to maximum. I ran for 1 hour, and for the second case, I don't think a checkpoint was started (the test took a long time to exit when the database was shut down, as almost 100 MB of log had to be handled). Please have a look at the attached figures, and see if they are as you expected. What bothers me in particular, are the spikes for the run with default checkpointing interval. As you can see, the throughput drops to (nearly) zero for 10 second periods, which is pretty bad. The checkpoint should not interfere with user activity in such a way. I have talked to some people about this, and we suspect there might be some kind of OS/filesystem issue that we're running into. This might be caused by the way the checkpoint writes pages to disk - write all dirty pages to disk, then sync at the very end. Depending on the underlying OS/filesystem/caches, the effects may vary. My runs were done on Solaris with the UFS filesystem. I also attatched a second graph where I used directio (option 'forcedirectio' when mounting). Unfortunately I do not have logs for disk io activity for these runs. The data and logs were stored on different physical disks (used 'logDevice'). The database was approx 17 GB, the page cache 0.5 GB. Embedded Derby, 16 clients/connections. Any comments on the graphs attached? -- Kristian [snip - map recovery time into Xmb of log stuff]There are only 2 reasons for checkpoints: 1) decrease recovery time after a system crash. 2) make it possible to delete log file information (if you don't have rollforward recovery backups). Without a checkpoint derby must keep all log files, thus space needed in the log directory will always grow.The background writer thread should handle this, it should not consider this an extreme case. If there were no background writer and no checkpoints then the following would happen: 1) the page cache grows to whatever maximum size it has gotten to 2) requests for a new page then use clock to determine what page to throw out. 3) if the page picked to throw out is dirty, then it is first written to the OS with no sync requested. It is up to the OS whether this is handled async or not. Most modern OS's will make this be an async operation unless the OS cache is full and then it will turn into a wait for some i/o (maybe some other i/o to free OS resource). The downside is that a user select at this point may end up waiting on a synchronous write of some page. 4) if the page to throw out is not dirty, then it can just be thrown out without any possible I/O wait. 5) In both cases 3 and 4 the user thread of course has to wait on the I/O to read the page into the cache. Depending on the OS cache this may or may not be a "real" I/O. The job of the background writer is to make case 3 less likely, that's it. Note if you try to keep the whole cache clean then you may flood the I/O system unnecessarily if the app tends to write the same page over and over again, then it is better to leave it dirty in cache until needed. The clock tends to do this by throwing out less used pages vs. more used pages. Kristian Waagan wrote: |
- Re: Discussion of how to map the recovery time into Xmb of... Kristian Waagan
- Re: Discussion of how to map the recovery time into X... Mike Matrigali


