On a side note, what about increasing the MDS service threads? Checking that, its running at its max of 128.
On Thu, Feb 2, 2012 at 9:54 AM, David Noriega <[email protected]> wrote: > We have two OSSs, each with two quad core AMD Opterons and 8GB of ram > and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun > StorageTek 2540 connected with 8Gb fiber. > > What about tweaking max_dirty_mb on the client side? > > On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz <[email protected]> wrote: >> David, >> >> The oss service threads is a function of your RAM size and CPUs. It's >> difficult to say what would be a good upper limit without knowing the size >> of your OSS, # clients, storage back-end and workload. But the good thing >> you can give a try on the fly via lctl set_param command. >> >> Assuming you are running lustre 1.8, here is a good explanation on how to >> do it: >> http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651263_ >> 87260 >> >> Some remarks: >> - reducing the number of OSS threads may impact the performance depending >> on how is your workload. >> - unfortunately I guess you will need to try and see what happens. I would >> go for 128 and analyze the behavior of your OSSs (via log files) and also >> keeping an eye on your workload. Seems to me that 300 is a bit too high >> (but again, I don't know what you have on your storage back-end or OSS >> configuration). >> >> >> I can't tell you much about the lru_size, but as far as I understand the >> values are dynamic and there's not much to do rather than clear the last >> recently used queue or disable the lru sizing. I can't help much on this >> other than pointing you out the explanation for it (see 31.2.11): >> >> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html >> >> >> Regards, >> Carlos >> >> >> >> >> -- >> Carlos Thomaz | HPC Systems Architect >> Mobile: +1 (303) 519-0578 >> [email protected] | Skype ID: carlosthomaz >> DataDirect Networks, Inc. >> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 >> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless >> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE >> >> >> >> >> >> On 2/1/12 2:11 PM, "David Noriega" <[email protected]> wrote: >> >>>zone_reclaim_mode is 0 on all clients/servers >>> >>>When changing number of service threads or the lru_size, can these be >>>done on the fly or do they require a reboot of either client or >>>server? >>>For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started >>>give about 300(300, 359) so I'm thinking try half of that and see how >>>it goes? >>> >>>Also checking lru_size, I get different numbers from the clients. cat >>>/proc/fs/lustre/ldlm/namespaces/*/lru_size >>> >>>Client: MDT0 OST0 OST1 OST2 OST3 MGC >>>head node: 0 22 22 22 22 400 (only a few users logged in) >>>busy node: 1 501 504 503 505 400 (Fully loaded with jobs) >>>samba/nfs server: 4 440070 44370 44348 26282 1600 >>> >>>So my understanding is the lru_size is set to auto by default thus the >>>varying values, but setting it manually is effectively setting a max >>>value? Also what does it mean to have a lower value(especially in the >>>case of the samba/nfs server)? >>> >>>On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor <[email protected]> wrote: >>>> >>>> You may also want to check and, if necessary, limit the lru_size on >>>>your clients. I believe there are guidelines in the ops manual. >>>>We have ~750 clients and limit ours to 600 per OST. That, combined >>>>with the setting zone_reclaim_mode=0 should make a big difference. >>>> >>>> Regards, >>>> >>>> Charlie Taylor >>>> UF HPC Center >>>> >>>> >>>> On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote: >>>> >>>>> Hi David, >>>>> >>>>> You may be facing the same issue discussed on previous threads, which >>>>>is >>>>> the issue regarding the zone_reclaim_mode. >>>>> >>>>> Take a look on the previous thread where myself and Kevin replied to >>>>> Vijesh Ek. >>>>> >>>>> If you don't have access to the previous emails, look at your kernel >>>>> settings for the zone reclaim: >>>>> >>>>> cat /proc/sys/vm/zone_reclaim_mode >>>>> >>>>> It should be set to 0. >>>>> >>>>> Also, look at the number of Lustre OSS service threads. It may be set >>>>>to >>>>> high... >>>>> >>>>> Rgds. >>>>> Carlos. >>>>> >>>>> >>>>> -- >>>>> Carlos Thomaz | HPC Systems Architect >>>>> Mobile: +1 (303) 519-0578 >>>>> [email protected] | Skype ID: carlosthomaz >>>>> DataDirect Networks, Inc. >>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 >>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless >>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 2/1/12 11:57 AM, "David Noriega" <[email protected]> wrote: >>>>> >>>>>> indicates the system was overloaded (too many service threads, or >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> [email protected] >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>>> Charles A. Taylor, Ph.D. >>>> Associate Director, >>>> UF HPC Center >>>> (352) 392-4036 >>>> >>>> >>>> >>> >>> >>> >>>-- >>>David Noriega >>>System Administrator >>>Computational Biology Initiative >>>High Performance Computing Center >>>University of Texas at San Antonio >>>One UTSA Circle >>>San Antonio, TX 78249 >>>Office: BSE 3.112 >>>Phone: 210-458-7100 >>>http://www.cbi.utsa.edu >>>_______________________________________________ >>>Lustre-discuss mailing list >>>[email protected] >>>http://lists.lustre.org/mailman/listinfo/lustre-discuss >> > > > > -- > David Noriega > System Administrator > Computational Biology Initiative > High Performance Computing Center > University of Texas at San Antonio > One UTSA Circle > San Antonio, TX 78249 > Office: BSE 3.112 > Phone: 210-458-7100 > http://www.cbi.utsa.edu -- David Noriega System Administrator Computational Biology Initiative High Performance Computing Center University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
