Hi guys, do you have any idea about my issue, my question? On Wed, Jan 27, 2010 at 8:59 AM, Lex <[email protected]> wrote:
> Hi all > > I heard somewhere about oversubscribing issue related to ost thread, but i > just wonder why i calculated followed the function that i founded in the > manual ( *thread_number = RAM * CPU core / 128 MB* - do correct me if > there's something wrong with it, please ) , the oversubscribing warning is > still appears. > > Maybe i have to choose my own value from trial and error, but is there any > explanation for this situation? > > @Erik : could u please describe your bottleneck problem with journal device > for me ? as detail as better ? > > > On Tue, Jan 26, 2010 at 10:00 PM, Erik Froese <[email protected]>wrote: > >> Sorry Lex I misread your email. I saw similar messages about my journal >> devices. The OST is an ext3+extra features filesystem. Each FS has an >> associated journal that CAN be on a separate device. Its supposed to speed >> up small file operations. Mine were oversubscribed and became a bottleneck. >> >> Erik >> >> >> On Mon, Jan 25, 2010 at 11:40 AM, Lex <[email protected]> wrote: >> >>> Sorry Erik if i'm rising such a "bad" question, could u tell me more >>> about OST journal device ? I even don't know what it is as well as haven't >>> seen it before, in the lustre manual. >>> >>> Best regards >>> >>> >>> On Mon, Jan 25, 2010 at 10:52 PM, Erik Froese <[email protected]>wrote: >>> >>>> Is each OST journals on its own physical disk? I've seen those messages >>>> when there isn't enough hardware dedicated to the journal device. >>>> Erik >>>> >>>> On Sun, Jan 24, 2010 at 11:43 PM, Aaron Knister < >>>> [email protected]> wrote: >>>> >>>>> I don't necessarily think there's anything wrong with using drbd or >>>>> running it over gigabit ethernet. If you stop all I/O to the lustre >>>>> filesystem, what does an hdparm -t show on the sdc and drbd devices? Do >>>>> you >>>>> have any performance numbers for the drbd or underlying raid devices? >>>>> >>>>> On Jan 24, 2010, at 11:17 PM, Lex wrote: >>>>> >>>>> Thank you for your fast reply, Aaron >>>>> >>>>> I'm using Giga Ethernet to synchronize data between to our fail-over >>>>> node. Is there something wrong ? Tell me, please >>>>> >>>>> On Mon, Jan 25, 2010 at 10:35 AM, Aaron Knister < >>>>> [email protected]> wrote: >>>>> >>>>>> My best guess (and please correct me if I'm wrong) is that those >>>>>> messages are because the underlying block devices are slow to respond to >>>>>> i/o >>>>>> requests. It looks like you're using DRBD. What's your interconnect? >>>>>> >>>>>> On Jan 24, 2010, at 9:42 PM, Lex wrote: >>>>>> >>>>>> Hi list >>>>>> >>>>>> I have one OSS with hadware info like this : >>>>>> >>>>>> CPU Intel(R) xeon E5420 2.5 Ghz >>>>>> Chipset intel 5000P >>>>>> 8GB RAM >>>>>> >>>>>> With this OSS, we using 2 RAID-5 arrays as OSTs ( each has 4 x 1.5 TB >>>>>> hard drive with RAID controller adaptec 5805 ) >>>>>> >>>>>> I worked quite smooth before, but, about 2 weeks ago, in >>>>>> /var/log/messages, i saw many warning ( i thought so) like this: >>>>>> >>>>>> *Jan 25 08:41:23 OST6 kernel: Lustre: >>>>>> 9587:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>>>>> direct_io 35s >>>>>> Jan 25 08:41:34 OST6 kernel: Lustre: >>>>>> 9608:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>>>>> direct_io 41s >>>>>> Jan 25 08:41:34 OST6 kernel: Lustre: >>>>>> 9608:0:(filter_io_26.c:706:filter_commitrw_write()) Skipped 2 previous >>>>>> similar messages >>>>>> Jan 25 08:41:35 OST6 kernel: Lustre: >>>>>> 9645:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>>>>> direct_io 43s >>>>>> Jan 25 08:58:10 OST6 kernel: Lustre: >>>>>> 9646:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>>>>> direct_io 31s >>>>>> Jan 25 08:59:39 OST6 kernel: Lustre: >>>>>> 9609:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>>>>> direct_io 30s >>>>>> Jan 25 09:01:05 OST6 kernel: Lustre: >>>>>> 9587:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>>>>> direct_io 33s >>>>>> Jan 25 09:03:23 OST6 kernel: Lustre: >>>>>> 9633:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>>>>> direct_io 32s >>>>>> Jan 25 09:11:25 OST6 kernel: Lustre: >>>>>> 9585:0:(filter_io_26.c:706:filter_commitrw_write()) lustre-OST0006: slow >>>>>> direct_io 36s* >>>>>> >>>>>> I googled around and found that it's because a problem with >>>>>> oss_num_threads and even though brought it down to 64 ( followed by the >>>>>> function i found in the 1.8 manual: thread_number = RAM * CPU core / 128 >>>>>> MB, >>>>>> its value is 256 ) >>>>>> >>>>>> *options ost oss_num_threads=64* >>>>>> >>>>>> It still didn't help. >>>>>> >>>>>> I thought it was only the harmless warning but maybe wrong, our >>>>>> performance is goes down quite heavily ( it's maybe because of other >>>>>> reason, >>>>>> but for now, i am only doubting slow direct_io problem ) >>>>>> >>>>>> iostat -m 1 1 >>>>>> Linux 2.6.18-92.1.17.el5_lustre.1.8.0custom (OST6) 01/25/2010 >>>>>> >>>>>> avg-cpu: %user %nice %system %iowait %steal %idle >>>>>> 0.01 0.02 2.86 25.01 0.00 72.10 >>>>>> >>>>>> Device: tps MB_read/s MB_wrtn/s MB_read MB_wrtn >>>>>> sda 1.30 0.01 0.00 11386 3469 >>>>>> sdb 1.30 0.01 0.00 11531 3469 >>>>>> sdc 131.50 *12.40* 0.26 11793218 >>>>>> 249934 >>>>>> sdd 178.46 *18.00* 0.26 17124065 >>>>>> 250334 >>>>>> md2 3.33 0.02 0.00 22915 2634 >>>>>> md1 0.00 0.00 0.00 0 0 >>>>>> md0 0.00 0.00 0.00 0 0 >>>>>> drbd3 480.10 *12.39* 0.26 11789047 >>>>>> 249639 >>>>>> drbd6 565.85 *14.89* 0.26 14168452 >>>>>> 249211 >>>>>> >>>>>> >>>>>> So, could anyone please tell me whether it's warning impact our system >>>>>> performance or not ? and if it does, give me solution or advice to >>>>>> resolve >>>>>> it, please >>>>>> >>>>>> Best regards >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Lustre-discuss mailing list >>>>>> [email protected] >>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> [email protected] >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>> >>>>> >>>> >>> >> >
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
