I can't comment much on this (don't have much experience tuning it), but Lustre 1.8 has a completely different timeouts architecture (Adaptive timeouts). I suggest you to take a deep look first:
-- Carlos Thomaz | HPC Systems Architect Mobile: +1 (303) 519-0578 [email protected] | Skype ID: carlosthomaz DataDirect Networks, Inc. 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless <http://twitter.com/ddn_limitless> | 1.800.TERABYTE On 2/2/12 5:05 PM, "David Noriega" <[email protected]> wrote: >I found this thread "Luster clients getting evicted" as I've also seen >the "ost_connect operation failed with -16" message and there they >recommend increasing the timeout, though that was for 1.6 and as I've >read 1.8 has a different timeout system. Reading that, would >increasing at_min(currently 0) or at_max(currently 600) be best? > >On Thu, Feb 2, 2012 at 12:07 PM, Andreas Dilger <[email protected]> >wrote: >> On 2012-02-02, at 8:54 AM, David Noriega wrote: >>> We have two OSSs, each with two quad core AMD Opterons and 8GB of ram >>> and two OSTs each(4.4T and 3.5T). Backend storage is a pair of Sun >>> StorageTek 2540 connected with 8Gb fiber. >> >> Running 32-64 threads per OST is the optimum number, based on previous >> experience. >> >>> What about tweaking max_dirty_mb on the client side? >> >> Probably unrelated. >> >>> On Wed, Feb 1, 2012 at 6:33 PM, Carlos Thomaz <[email protected]> wrote: >>>> David, >>>> >>>> The oss service threads is a function of your RAM size and CPUs. It's >>>> difficult to say what would be a good upper limit without knowing the >>>>size >>>> of your OSS, # clients, storage back-end and workload. But the good >>>>thing >>>> you can give a try on the fly via lctl set_param command. >>>> >>>> Assuming you are running lustre 1.8, here is a good explanation on >>>>how to >>>> do it: >>>> >>>>http://wiki.lustre.org/manual/LustreManual18_HTML/LustreProc.html#50651 >>>>263_ >>>> 87260 >>>> >>>> Some remarks: >>>> - reducing the number of OSS threads may impact the performance >>>>depending >>>> on how is your workload. >>>> - unfortunately I guess you will need to try and see what happens. I >>>>would >>>> go for 128 and analyze the behavior of your OSSs (via log files) and >>>>also >>>> keeping an eye on your workload. Seems to me that 300 is a bit too >>>>high >>>> (but again, I don't know what you have on your storage back-end or OSS >>>> configuration). >>>> >>>> >>>> I can't tell you much about the lru_size, but as far as I understand >>>>the >>>> values are dynamic and there's not much to do rather than clear the >>>>last >>>> recently used queue or disable the lru sizing. I can't help much on >>>>this >>>> other than pointing you out the explanation for it (see 31.2.11): >>>> >>>> http://wiki.lustre.org/manual/LustreManual20_HTML/LustreProc.html >>>> >>>> >>>> Regards, >>>> Carlos >>>> >>>> >>>> >>>> >>>> -- >>>> Carlos Thomaz | HPC Systems Architect >>>> Mobile: +1 (303) 519-0578 >>>> [email protected] | Skype ID: carlosthomaz >>>> DataDirect Networks, Inc. >>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 >>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless >>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE >>>> >>>> >>>> >>>> >>>> >>>> On 2/1/12 2:11 PM, "David Noriega" <[email protected]> wrote: >>>> >>>>> zone_reclaim_mode is 0 on all clients/servers >>>>> >>>>> When changing number of service threads or the lru_size, can these be >>>>> done on the fly or do they require a reboot of either client or >>>>> server? >>>>> For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started >>>>> give about 300(300, 359) so I'm thinking try half of that and see how >>>>> it goes? >>>>> >>>>> Also checking lru_size, I get different numbers from the clients. cat >>>>> /proc/fs/lustre/ldlm/namespaces/*/lru_size >>>>> >>>>> Client: MDT0 OST0 OST1 OST2 OST3 MGC >>>>> head node: 0 22 22 22 22 400 (only a few users logged in) >>>>> busy node: 1 501 504 503 505 400 (Fully loaded with jobs) >>>>> samba/nfs server: 4 440070 44370 44348 26282 1600 >>>>> >>>>> So my understanding is the lru_size is set to auto by default thus >>>>>the >>>>> varying values, but setting it manually is effectively setting a max >>>>> value? Also what does it mean to have a lower value(especially in the >>>>> case of the samba/nfs server)? >>>>> >>>>> On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor <[email protected]> >>>>>wrote: >>>>>> >>>>>> You may also want to check and, if necessary, limit the lru_size on >>>>>> your clients. I believe there are guidelines in the ops manual. >>>>>> We have ~750 clients and limit ours to 600 per OST. That, combined >>>>>> with the setting zone_reclaim_mode=0 should make a big difference. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Charlie Taylor >>>>>> UF HPC Center >>>>>> >>>>>> >>>>>> On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote: >>>>>> >>>>>>> Hi David, >>>>>>> >>>>>>> You may be facing the same issue discussed on previous threads, >>>>>>>which >>>>>>> is >>>>>>> the issue regarding the zone_reclaim_mode. >>>>>>> >>>>>>> Take a look on the previous thread where myself and Kevin replied >>>>>>>to >>>>>>> Vijesh Ek. >>>>>>> >>>>>>> If you don't have access to the previous emails, look at your >>>>>>>kernel >>>>>>> settings for the zone reclaim: >>>>>>> >>>>>>> cat /proc/sys/vm/zone_reclaim_mode >>>>>>> >>>>>>> It should be set to 0. >>>>>>> >>>>>>> Also, look at the number of Lustre OSS service threads. It may be >>>>>>>set >>>>>>> to >>>>>>> high... >>>>>>> >>>>>>> Rgds. >>>>>>> Carlos. >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Carlos Thomaz | HPC Systems Architect >>>>>>> Mobile: +1 (303) 519-0578 >>>>>>> [email protected] | Skype ID: carlosthomaz >>>>>>> DataDirect Networks, Inc. >>>>>>> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 >>>>>>> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless >>>>>>> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 2/1/12 11:57 AM, "David Noriega" <[email protected]> wrote: >>>>>>> >>>>>>>> indicates the system was overloaded (too many service threads, or >>>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Lustre-discuss mailing list >>>>>>> [email protected] >>>>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>>>> >>>>>> Charles A. Taylor, Ph.D. >>>>>> Associate Director, >>>>>> UF HPC Center >>>>>> (352) 392-4036 >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> David Noriega >>>>> System Administrator >>>>> Computational Biology Initiative >>>>> High Performance Computing Center >>>>> University of Texas at San Antonio >>>>> One UTSA Circle >>>>> San Antonio, TX 78249 >>>>> Office: BSE 3.112 >>>>> Phone: 210-458-7100 >>>>> http://www.cbi.utsa.edu >>>>> _______________________________________________ >>>>> Lustre-discuss mailing list >>>>> [email protected] >>>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >>>> >>> >>> >>> >>> -- >>> David Noriega >>> System Administrator >>> Computational Biology Initiative >>> High Performance Computing Center >>> University of Texas at San Antonio >>> One UTSA Circle >>> San Antonio, TX 78249 >>> Office: BSE 3.112 >>> Phone: 210-458-7100 >>> http://www.cbi.utsa.edu >>> _______________________________________________ >>> Lustre-discuss mailing list >>> [email protected] >>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> >> Cheers, Andreas >> -- >> Andreas Dilger Whamcloud, Inc. >> Principal Engineer http://www.whamcloud.com/ >> >> >> >> > > > >-- >David Noriega >System Administrator >Computational Biology Initiative >High Performance Computing Center >University of Texas at San Antonio >One UTSA Circle >San Antonio, TX 78249 >Office: BSE 3.112 >Phone: 210-458-7100 >http://www.cbi.utsa.edu >_______________________________________________ >Lustre-discuss mailing list >[email protected] >http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
