zone_reclaim_mode is 0 on all clients/servers When changing number of service threads or the lru_size, can these be done on the fly or do they require a reboot of either client or server? For my two OSTs, cat /proc/fs/lustre/ost/OSS/ost_io/threads_started give about 300(300, 359) so I'm thinking try half of that and see how it goes?
Also checking lru_size, I get different numbers from the clients. cat /proc/fs/lustre/ldlm/namespaces/*/lru_size Client: MDT0 OST0 OST1 OST2 OST3 MGC head node: 0 22 22 22 22 400 (only a few users logged in) busy node: 1 501 504 503 505 400 (Fully loaded with jobs) samba/nfs server: 4 440070 44370 44348 26282 1600 So my understanding is the lru_size is set to auto by default thus the varying values, but setting it manually is effectively setting a max value? Also what does it mean to have a lower value(especially in the case of the samba/nfs server)? On Wed, Feb 1, 2012 at 1:27 PM, Charles Taylor <[email protected]> wrote: > > You may also want to check and, if necessary, limit the lru_size on your > clients. I believe there are guidelines in the ops manual. We have > ~750 clients and limit ours to 600 per OST. That, combined with the setting > zone_reclaim_mode=0 should make a big difference. > > Regards, > > Charlie Taylor > UF HPC Center > > > On Feb 1, 2012, at 2:04 PM, Carlos Thomaz wrote: > >> Hi David, >> >> You may be facing the same issue discussed on previous threads, which is >> the issue regarding the zone_reclaim_mode. >> >> Take a look on the previous thread where myself and Kevin replied to >> Vijesh Ek. >> >> If you don't have access to the previous emails, look at your kernel >> settings for the zone reclaim: >> >> cat /proc/sys/vm/zone_reclaim_mode >> >> It should be set to 0. >> >> Also, look at the number of Lustre OSS service threads. It may be set to >> high... >> >> Rgds. >> Carlos. >> >> >> -- >> Carlos Thomaz | HPC Systems Architect >> Mobile: +1 (303) 519-0578 >> [email protected] | Skype ID: carlosthomaz >> DataDirect Networks, Inc. >> 9960 Federal Dr., Ste 100 Colorado Springs, CO 80921 >> ddn.com <http://www.ddn.com/> | Twitter: @ddn_limitless >> <http://twitter.com/ddn_limitless> | 1.800.TERABYTE >> >> >> >> >> >> On 2/1/12 11:57 AM, "David Noriega" <[email protected]> wrote: >> >>> indicates the system was overloaded (too many service threads, or >>> >> >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Charles A. Taylor, Ph.D. > Associate Director, > UF HPC Center > (352) 392-4036 > > > -- David Noriega System Administrator Computational Biology Initiative High Performance Computing Center University of Texas at San Antonio One UTSA Circle San Antonio, TX 78249 Office: BSE 3.112 Phone: 210-458-7100 http://www.cbi.utsa.edu _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
