Sorry Erik if i'm rising such a "bad" question, could u tell me more about OST journal device ? I even don't know what it is as well as haven't seen it before, in the lustre manual.
Best regards On Mon, Jan 25, 2010 at 10:52 PM, Erik Froese <[email protected]> wrote: > Is each OST journals on its own physical disk? I've seen those messages > when there isn't enough hardware dedicated to the journal device. > Erik > > On Sun, Jan 24, 2010 at 11:43 PM, Aaron Knister > <[email protected]>wrote: > >> I don't necessarily think there's anything wrong with using drbd or >> running it over gigabit ethernet. If you stop all I/O to the lustre >> filesystem, what does an hdparm -t show on the sdc and drbd devices? Do you >> have any performance numbers for the drbd or underlying raid devices? >> >> On Jan 24, 2010, at 11:17 PM, Lex wrote: >> >> Thank you for your fast reply, Aaron >> >> I'm using Giga Ethernet to synchronize data between to our fail-over node. >> Is there something wrong ? Tell me, please >> >> On Mon, Jan 25, 2010 at 10:35 AM, Aaron Knister >> <[email protected]>wrote: >> >>> My best guess (and please correct me if I'm wrong) is that those messages >>> are because the underlying block devices are slow to respond to i/o >>> requests. It looks like you're using DRBD. What's your interconnect? >>> >>> On Jan 24, 2010, at 9:42 PM, Lex wrote: >>> >>> Hi list >>> >>> I have one OSS with hadware info like this : >>> >>> CPU Intel(R) xeon E5420 2.5 Ghz >>> Chipset intel 5000P >>> 8GB RAM >>> >>> With this OSS, we using 2 RAID-5 arrays as OSTs ( each has 4 x 1.5 TB >>> hard drive with RAID controller adaptec 5805 ) >>> >>> I worked quite smooth before, but, about 2 weeks ago, in >>> /var/log/messages, i saw many warning ( i thought so) like this: >>> >>> *Jan 25 08:41:23 OST6 kernel: Lustre: >>> 9587:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>> direct_io 35s >>> Jan 25 08:41:34 OST6 kernel: Lustre: >>> 9608:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>> direct_io 41s >>> Jan 25 08:41:34 OST6 kernel: Lustre: >>> 9608:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 2 previous >>> similar messages >>> Jan 25 08:41:35 OST6 kernel: Lustre: >>> 9645:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>> direct_io 43s >>> Jan 25 08:58:10 OST6 kernel: Lustre: >>> 9646:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>> direct_io 31s >>> Jan 25 08:59:39 OST6 kernel: Lustre: >>> 9609:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>> direct_io 30s >>> Jan 25 09:01:05 OST6 kernel: Lustre: >>> 9587:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>> direct_io 33s >>> Jan 25 09:03:23 OST6 kernel: Lustre: >>> 9633:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>> direct_io 32s >>> Jan 25 09:11:25 OST6 kernel: Lustre: >>> 9585:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>> direct_io 36s* >>> >>> I googled around and found that it's because a problem with >>> oss_num_threads and even though brought it down to 64 ( followed by the >>> function i found in the 1.8 manual: thread_number = RAM * CPU core / 128 MB, >>> its value is 256 ) >>> >>> *options ost oss_num_threads=64* >>> >>> It still didn't help. >>> >>> I thought it was only the harmless warning but maybe wrong, our >>> performance is goes down quite heavily ( it's maybe because of other reason, >>> but for now, i am only doubting slow direct_io problem ) >>> >>> iostat -m 1 1 >>> Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (OST6) 01/25/2010 >>> >>> avg-cpu: %user %nice %system %iowait %steal %idle >>> 0.01 0.02 2.86 25.01 0.00 72.10 >>> >>> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn >>> sda 1.30 0.01 0.00 11386 3469 >>> sdb 1.30 0.01 0.00 11531 3469 >>> sdc 131.50 *12.40* 0.26 11793218 249934 >>> sdd 178.46 *18.00* 0.26 17124065 250334 >>> md2 3.33 0.02 0.00 22915 2634 >>> md1 0.00 0.00 0.00 0 0 >>> md0 0.00 0.00 0.00 0 0 >>> drbd3 480.10 *12.39* 0.26 11789047 249639 >>> drbd6 565.85 *14.89* 0.26 14168452 249211 >>> >>> >>> So, could anyone please tell me whether it's warning impact our system >>> performance or not ? and if it does, give me solution or advice to resolve >>> it, please >>> >>> Best regards >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>> >>> >>> >> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
