You really like collectl! I've seen it mentioned here many times. Perhaps I'll actually give it a shot! :) It sounds like a great piece of software. Is there a standard way to centrally log collectl stats?
On Dec 25, 2007, at 11:38 AM, Mark Seger wrote: > As a collectl-ism, it looks like you're running it in a mode that > requests data distribution by ost, which is great when there's a lot > of different things going on in each ost. But if you run things in > summary mode little 'l' instead of 'L', you'll see one line per > sample without all the header crap that separates out each sample as > shown here. I've also tossed in -oT to include timestamps. The > reasons for all the 0's is because there is currently no traffic, > but I think you'll get the idea > > [EMAIL PROTECTED] ~]# collectl -s l -OB > waiting for 1 second sample... > > # LUSTRE FILESYSTEM SUMMARY > # <-----------------------reads----------------------- OST > -------------------writes-------------------------> > # Rds RdK 1P 2P 4P 8P 16P 32P 64P 128P 256P 512P > 1024P Wrts WrtK 1P 2P 4P 8P 16P 32P 64P 128P 256P 512P > 1024P > 11:37:04 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 > 11:37:05 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 > 11:37:06 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 > 11:37:07 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 > 11:37:08 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 > 11:37:09 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 > 11:37:10 0 0 0 0 0 0 0 0 0 0 0 > 0 0 0 0 0 0 0 0 0 0 0 0 0 > 0 0 > > -mark > > Balagopal Pillai wrote: >> On Tue, 25 Dec 2007, Aaron Knister wrote: >> Looks like there were lots of tiny writes (no reads) at the time >> when this happened. Can't really blame the tiny writes. I was >> testing with making the tar file out of /usr/lib!. Here is the >> playback from colletcl on a simple tar right after the the volume >> data1 is mounted. >> ### RECORD 7 >>> lustre1 <<< (1198592822) (Tue Dec 25 10:27:02 >> 2007) ### >> >> # LUSTRE FILESYSTEM SINGLE OST STATISTICS >> #Ost Rds RdK 1K 2K 4K 8K 16K 32K 64K 128K >> 256K Wrts WrtK 1K 2K 4K 8K 16K 32K 64K 128K 256K >> data1-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 207 429M 0 0 0 0 0 0 2 0 >> data2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data3-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data4-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> >> >> There is 429M of 1k i/o. Here is the collectl playback when >> the data1 starts working normally, after 10 - 15 minutes of mounting. >> >> ### RECORD 5 >>> lustre1 <<< (1198598746) (Tue Dec 25 12:05:46 >> 2007) ### >> >> # LUSTRE FILESYSTEM SINGLE OST STATISTICS >> #Ost Rds RdK 1K 2K 4K 8K 16K 32K 64K 128K >> 256K Wrts WrtK 1K 2K 4K 8K 16K 32K 64K 128K 256K >> data1-OST0000 18 17K 0 0 1 0 0 0 0 >> 0 17 10 10K 0 0 0 0 0 0 0 0 10 >> data2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data3-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data4-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> ### RECORD 6 >>> lustre1 <<< (1198598757) (Tue Dec 25 12:05:57 >> 2007) ### >> >> # LUSTRE FILESYSTEM SINGLE OST STATISTICS >> #Ost Rds RdK 1K 2K 4K 8K 16K 32K 64K 128K >> 256K Wrts WrtK 1K 2K 4K 8K 16K 32K 64K 128K 256K >> data1-OST0000 222 3625 7 5 195 15 0 0 0 >> 0 0 1 403 0 0 0 0 0 0 0 0 0 >> data2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data3-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data4-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> ### RECORD 7 >>> lustre1 <<< (1198598767) (Tue Dec 25 12:06:07 >> 2007) ### >> >> # LUSTRE FILESYSTEM SINGLE OST STATISTICS >> #Ost Rds RdK 1K 2K 4K 8K 16K 32K 64K 128K >> 256K Wrts WrtK 1K 2K 4K 8K 16K 32K 64K 128K 256K >> data1-OST0000 53 846 2 1 46 4 0 0 0 >> 0 0 2 1866 0 0 0 0 0 0 0 0 1 >> data2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data3-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data4-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> ### RECORD 8 >>> lustre1 <<< (1198598777) (Tue Dec 25 12:06:17 >> 2007) ### >> >> # LUSTRE FILESYSTEM SINGLE OST STATISTICS >> #Ost Rds RdK 1K 2K 4K 8K 16K 32K 64K 128K >> 256K Wrts WrtK 1K 2K 4K 8K 16K 32K 64K 128K 256K >> data1-OST0000 11 10K 0 0 1 0 0 0 0 >> 0 10 8 8499 0 0 0 0 0 0 0 0 9 >> data2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data3-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data4-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> ### RECORD 9 >>> lustre1 <<< (1198598787) (Tue Dec 25 12:06:27 >> 2007) ### >> >> # LUSTRE FILESYSTEM SINGLE OST STATISTICS >> #Ost Rds RdK 1K 2K 4K 8K 16K 32K 64K 128K >> 256K Wrts WrtK 1K 2K 4K 8K 16K 32K 64K 128K 256K >> data1-OST0000 245 3914 2 6 222 15 0 0 0 >> 0 0 2 421 0 0 1 0 0 0 0 0 0 >> data2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data3-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data4-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> ### RECORD 10 >>> lustre1 <<< (1198598797) (Tue Dec 25 12:06:37 >> 2007) ### >> >> # LUSTRE FILESYSTEM SINGLE OST STATISTICS >> #Ost Rds RdK 1K 2K 4K 8K 16K 32K 64K 128K >> 256K Wrts WrtK 1K 2K 4K 8K 16K 32K 64K 128K 256K >> data1-OST0000 105 1678 5 4 84 12 0 0 0 >> 0 0 3 171 0 0 2 1 0 0 0 0 0 >> data2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data3-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> data4-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch1-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0000 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> scratch2-OST0001 0 0 0 0 0 0 0 0 0 >> 0 0 0 0 0 0 0 0 0 0 0 0 0 >> >> ### RECORD 11 >>> lustre1 <<< (1198598807) (Tue Dec 25 12:06:47 >> 2007) ### >> >> This looks normal. I am seeing the same thing on another >> Lustre installation (1.6.0.1) occassionally. But it goes away after >> a while. >> >> Balagopal >> >> >> >>> When the volume is slugish do you see lots of tiny reads (500k/ >>> sec) on the >>> full volume? When it slows down could you run an "iostat -k 2" on >>> the OSS in >>> question? I think we may be having the same problem. I could find no >>> answer/solution and ended up blowing the whole setup away and >>> starting from >>> scratch. I'd like to track this down and figure out if its >>> actually a bug or >>> whether I FUBAR'd something in my setup. >>> >>> -Aaron >>> >>> On Dec 25, 2007, at 9:42 AM, Balagopal Pillai wrote: >>> >>> >>>> Hi, >>>> >>>> Please ignore the previous email. It seemed to solve itself >>>> after 10 >>>> - 15 minutes of mounting the filled volume. Now it is as fast as >>>> the empty >>>> volumes. >>>> >>>> Thanks >>>> Balagopal >>>> >>>> ---------- Forwarded message ---------- >>>> Date: Tue, 25 Dec 2007 10:36:28 -0400 (AST) >>>> From: Balagopal Pillai <[EMAIL PROTECTED]> >>>> To: <[email protected]> >>>> Subject: lustre performance question >>>> >>>> Hi, >>>> >>>> We have one Lustre volume that is getting full and some other >>>> volumes that are totally empty. The one that is full is a little >>>> sluggish >>>> at times with the following messages appearing in syslog on the >>>> OSS - >>>> >>>> Lustre: 5809:0:(filter_io_26.c:698:filter_commitrw_write()) data1- >>>> OST0001: >>>> slow i_mutex 82s >>>> Lustre: 5809:0:(filter_io_26.c:711:filter_commitrw_write()) data1- >>>> OST0001: >>>> slow brw_start 82s >>>> Lustre: 5809:0:(filter_io_26.c:763:filter_commitrw_write()) data1- >>>> OST0001: >>>> slow direct_io 82s >>>> Lustre: 5809:0:(filter_io_26.c:776:filter_commitrw_write()) data1- >>>> OST0001: >>>> slow commitrw commit 82s >>>> >>>> But the same two OSS are also exporting the empty volume, >>>> which >>>> is very fast on any tests (like creation of a tar file, bonnie >>>> etc etc) >>>> I also tested the same thing on the nfs exported backup volume of >>>> the >>>> filled up lustre volume (exported from the same OSS server) and >>>> it doesn't >>>> show any significant slow down. Is it normal for Lustre volumes >>>> to slow >>>> down when the volumes get full? >>>> >>>> Thanks >>>> Balagopal >>>> >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> [email protected] >>>> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >>>> >>> Aaron Knister >>> Associate Systems Analyst >>> Center for Ocean-Land-Atmosphere Studies >>> >>> (301) 595-7000 >>> [EMAIL PROTECTED] >>> >>> >>> >>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> https://mail.clusterfs.com/mailman/listinfo/lustre-discuss >> > Aaron Knister Associate Systems Analyst Center for Ocean-Land-Atmosphere Studies (301) 595-7000 [EMAIL PROTECTED] _______________________________________________ Lustre-discuss mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-discuss
