Hi Vlad, Hi Paul, maybe it is a misleading observation. I never had any checkpointing "delay" with tests having pbxt_flush_log_at_trx_commit=1 instead pbxt_flush_log_at_trx_commit=2.
Regards Erkan On Thu, Dec 02, 2010 at 03:31:34PM +0100, Paul McCullagh wrote: > Hi Vlad, > > There are a number of things which are strange about this test. > > Firstly the flush takes 25 minutes. From the xtstat dump we can see > that the main flushing on the index file stops at 101202 13:00:13. > This is correct according to the "select now()" below (12:35:46 + 24 > minutes). > > This may not be the problem, but rather this: > > 1. Checkpointing does not seem to be happening, because to-chkpt is at > 14.9G, and seems to remain there. > > The lack of checkpoint may be explained be second bit of strange > activity: > > 2. Lots of little flusing on the ilog, but no update to the index data: > > ind ind ind ind ind ind ind ind ind ilog ilog > ilog ilog ilog > in out rd.ms wr.ms syncs/ms hits miss %use %dty in out > rd.ms wr.ms syncs/ms > 0 0 0 0 0/0 0 0 100 0.0 0 > 54 0 0 6/3 > 0 0 0 0 0/0 0 0 100 0.0 0 > 36 0 0 4/2 > 0 0 0 0 0/0 0 0 100 0.0 0 > 54 0 0 6/3 > > This is very strange and need to be investigated. Basically, the ilog > should be written, then the ilog is read, and the index file is > written. That normally concludes the flushing of the index. > > It looks like the checkpointer thread is in a loop here, and does not > get to the end of flushing the index, and so it does not get to the > end of the checkpoint. > > @Vlad or Erkan, could you please report this as a bug, with the > information we have so far. Next we will have to see if we can repeat > this strange index "non-flush" behavior. Maybe using a similar test, > but a little less data. > > On Dec 2, 2010, at 12:31 PM, erkan yanar wrote: > > >Hi Paul, > > > >I repeatet it for you;) > > > >mysql> select now(); truncate rein; > >+---------------------+ > >| now() | > >+---------------------+ > >| 2010-12-02 12:35:46 | > >+---------------------+ > >1 row in set (0.00 sec) > > > >Query OK, 0 rows affected (24 min 49.47 sec) > > > >So it was successfull. > > > >The xtstat-ouptut is attached. So maybe yo see something that might > >helpfull. > > > >Regards > >Erkan > > > >P.S. > >Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s > >avgrq-sz avgqu-sz await svctm %util > >dm-0 0.00 0.00 0.00 666.00 0.00 9040.00 > >13.57 186.48 327.86 1.50 100.00 > > > > > >On Thu, Dec 02, 2010 at 09:13:48AM +0100, Paul McCullagh wrote: > >>Hi Erkan, > >> > >>Yes. Strange pause. But I cannot see what the engine is doing in that > >>time. > >> > >>There must be activity in some missing statistics, for example, > >>record, row or index file writing. > >> > >>Unfortunately, when displaying all statistics, the output is very > >>wide. What xtstat needs is a feature which logs all statistics but > >>just displays what you select. > >> > >>Then, if you see a problem that does not show up in the displayed > >>statistics you can still go back and check the others at the time. > >> > >>On Nov 30, 2010, at 3:05 AM, erkan yanar wrote: > >> > >>>Moin Paul > >>> > >>>On Mon, Nov 29, 2010 at 10:23:38AM +0100, Paul McCullagh wrote: > >>>> > >>>> > >>>>The problem here is the index cache is full, and the thread which > >>>>is > >>>>flushing the index is hanging in sync (i.e. flush to disk). The > >>>>rest > >>>>of the system is hanging because it needs a clean page in the > >>>>index. > >>>> > >>>>I did quite a bit of work to improve this in 1.1 and 1.5. > >>>> > >>>>But in general, if the index cache is really large, it need to be > >>>>flushed more regularly to ensure that too many changes do not > >>>>accumulate. Checkpointing should do this, but there is a bug in > >>>>1.0. > >>>> > >>>>You checkpoint frequency is 128M, xtstat says to-chkpt = 1049M. > >>>>This > >>>>means the checkpoint has not occurred as often as he should. > >>>> > >>>>This problem should be fixed in 1.1. > >>>> > >>> > >>>While 1.5.02 Beta handles IO a lot better than 1.0.11-7, you may be > >>>interested in that checkpointing issue: > >>>pbxt_checkpoint_frequency = 67108864 > >>> > >>> time time xact xact xact xact stat stat ilog > >>>ilog ilog ilog ilog xlog xlog xlog xlog xlog xlog xlog > >>>xlog to to to sweep row row row row > >>> curr msec commt rollb waits dirty read write in out > >>>rd.ms wr.ms syncs/ms in out rd.ms wr.ms syncs sy.ms hits miss > >>>chkpt write sweep waits sel ins upd del > >>>101130 03:45:13 5001 198t 0 0 10.9t 0 198t 320M > >>>0 83 0 0/0 0 21.1M 0 12 9 3576 69.6t > >>>0 813M 9833K 0 13 0 198t 0 0 > >>>101130 03:45:18 5001 9501 0 0 5148 0 9501 0 > >>>0 0 0 0/0 0 0 0 0 0 5001 0 > >>>0 813M 9833K 0 1 0 9501 0 0 > >>>101130 03:45:23 5001 0 0 0 5148 0 0 0 > >>>0 0 0 0/0 0 0 0 0 0 5001 0 > >>>0 813M 9833K 0 0 0 0 0 0 > >>>101130 03:45:28 5001 80.1t 0 0 13.8t 0 80.1t 0 > >>>0 0 0 0/0 0 8192K 0 4 1 4993 1 > >>>0 814M 17.6M 115M 5 0 80.1t 0 0 > >>>101130 03:45:33 5001 179t 0 0 10.3t 0 179t 0 > >>>0 0 0 0/0 1244K 19.7M 0 8 20 4395 119t > >>>39 837M 6772K 0 13 0 179t 0 0 > >>>101130 03:45:38 5001 12.7t 0 0 9084 0 12.7t 0 > >>>0 0 0 0/0 0 0 0 0 0 5001 5507 > >>>0 837M 3756K 0 1 0 12.7t 0 0 > >>>101130 03:45:43 5001 78.4t 0 0 8907 0 78.4t 0 > >>>0 0 0 0/0 0 8192K 0 3 1 4994 18.1t > >>>0 840M 8192K 0 5 0 78.4t 0 0 > >>>101130 03:45:48 5001 156t 0 0 4410 0 156t 0 > >>>0 0 0 0/0 0 16.0M 0 8 2 4985 67.5t > >>>0 856M 8192K 0 11 0 156t 0 0 > >>>101130 03:45:53 5001 78.9t 0 0 7213 0 78.9t 0 > >>>0 0 0 0/0 0 8192K 0 6 1 4991 1 > >>>0 864M 16.0M 0 7 0 78.9t 0 0 > >>> > >>> > >>>Best Regards > >>>Erkan > >>> > >>>-- > >>>über den grenzen muß die freiheit wohl wolkenlos sein > >> > > > >-- > >über den grenzen muß die freiheit wohl wolkenlos sein > ><xtstat-truncate.log.gz> > -- über den grenzen muß die freiheit wohl wolkenlos sein _______________________________________________ Mailing list: https://launchpad.net/~pbxt-discuss Post to : pbxt-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~pbxt-discuss More help : https://help.launchpad.net/ListHelp