On Aug 28, 2009, at 3:08 PM, Brian J. Murrell wrote: > On Fri, 2009-08-28 at 15:00 -0400, Scott Atchley wrote: >> Lustre: 4227:0:(filter_io_26.c:641:filter_commitrw_write()) lustre- >> OST0000: slow i_mutex 30s >> Lustre: 4222:0:(lustre_fsfilt.h:320:fsfilt_commit_wait()) lustre- >> OST0000: slow journal start 30s >> Lustre: 4222:0:(filter_io_26.c:724:filter_commitrw_write()) lustre- >> OST0000: slow commitrw commit 30s >> Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) lustre- >> OST0000: slow direct_io 30s >> Lustre: 4242:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 4 >> previous similar messages >> >> Should I be concerned or is this normal? > > It means that I/Os are completing more slowly that Lustre would like, > which as you can guess means you are hammering the disk(s) too hard. > Try reducing the number of OST threads. Ideally you want those > messages > to go away even when you are pushing the OSTs to capacity. Ideally > you > want just enough OST threads to push the disks to capacity but no > more. > So measure, reduce, measure. If the throughput is the same or better > after reducing, reduce further and measure again. Repeat until you > have > found the sweet spot. > > Obdfilter-survey in the iokit automates this for you running many > tests > at different thread counts letting you see where the sweet spot is > without all the iterating.
Hi Brian, Thanks for the description. Since I am mainly testing for correctness of MXLND, I am not worried about hammering my test disk. I will keep this in mind in case I get a big fat RAID this Christmas. ;-) Scott _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
