Hi Andreas, Thanks for your update! Some comments below.
JF On Mon, Oct 15, 2012 at 7:04 PM, Dilger, Andreas <[email protected]>wrote: > On Oct 15, 2012, at 1:01 PM, Jean-Francois Le Fillatre wrote: > > Yes this is one strange formula... There are two ways of reading it: > > > > - "one thread per 128MB of RAM, times the number of CPUs in the system" > > On one of our typical OSSes (24 GB, 8 cores), that would give: > ((24*1024) / 128) * 8 = 1536 > > And that's waaaay out there… > > This formula was first created when there was perhaps 2GB of RAM and 2 > cores in the system, intended to get some rough correspondence between > server size and thread count. Note that there is also a default upper > limit of 512 for threads created on the system. However, on some systems > in the past with slow/synchronous storage, having 1500-2000 IO threads was > still improving performance and could be set manually. That said, it was > always intended as a reasonable heuristic and local performance > testing/tuning should pick the optimal number. > > > - "as many threads as you can fit (128MB * numbers of CPUs) in the RAM > of your system" > > Which would then give: (24*1024) / (128*8) = 24 > > This isn't actually representing what the formula calculates. > > > For a whole system, that's really low. But for one single OST, it almost > makes sense, in which case you'd want to multiply that by the number of > OSTs connected to your OSS. > > The rule of thumb that I've seen in the past, based on benchmarks at many > sites is 32 threads/OST, which will keep the low-level elevators busy, but > not make the queue depth too high. > > > The way we did it here is that we identified that the major limiting > parameter is the software RAID, both in terms of bandwidth performance and > CPU use. So I did some tests on a spare machine to get some load and perf > figures for one array, using sgpdd-survey. Then, taking into account the > number of OST per OSS (4) and the overhead of Lustre, I figured out that > the best thread count would be around 96 (which is 24*4, spot on). > > > > One major limitation in Lustre 1.8.x (I don't know if it has changed in > 2.x) is that only the global thread count for the OSS can be specified. We > have cases where all OSS threads are used on a single OST, and that > completely trashes the bandwidth and latency. We would really need a max > thread count per OST too, so that no single OST would get hit that way. On > our systems, I'd put the max OST thread count at 32 (to stay in the > software RAID performance sweet spot) and the max OSS thread count at 96 > (to limit CPU load). > > Right. This is improved in Lustre 2.3, which binds the threads to > specific cores. I believe it is also possible to bind OSTs to specific > cores as well for PCI/HBA/HCA affinity though I'm not 100% sure if the > OST/CPU binding was included or not. > Even in I could bind both OST and threads to a given CPU, it's only a topological optimization for bandwidth and latency, but what would prevent a thread to answer a request for a target that is bound to another CPU? I mean, this is a very nice feature, and with proper configuration it can bring some notable improvements in performance, but I fail to see how it would solve the issue of having all threads on an OSS hammering a single OST. I am aware that this is a border case, in general use there the load is spread over multiple targets and there's no problem. But we've hit it here a few times, and I know of some other sites where they have had the issue too. If you combine that with RAID issues (like slow disk / read errors / disk failure / rebuild or resync), you have a machine that locks up so bad that a cold reset is the only way to get it back under control. Worst case? Yes. But because the consequences of such a situation can be so nasty, I would be very happy to be able to control thread allocation per OST more finely. > > Thanks! > > JF > > > > > > > > On Mon, Oct 15, 2012 at 2:20 PM, David Noriega <[email protected]> > wrote: > > How does one estimate a good number of service threads? I'm not sure I > > understand the following: 1 thread / 128MB * number of cpus > > > > On Wed, Oct 10, 2012 at 9:17 AM, Jean-Francois Le Fillatre > > <[email protected]> wrote: > > > > > > Hi David, > > > > > > It needs to be specified as a module parameter at boot time, in > > > /etc/modprobe.conf. Check the Lustre tuning page: > > > http://wiki.lustre.org/manual/LustreManual18_HTML/LustreTuning.html > > > http://wiki.lustre.org/manual/LustreManual20_HTML/LustreTuning.html > > > > > > Note that once created, the threads won't be destroyed, so if you want > to > > > lower your thread count you'll need to reboot your system. > > > > > > Thanks, > > > JF > > > > > > > > > On Tue, Oct 9, 2012 at 6:00 PM, David Noriega <[email protected]> > wrote: > > >> > > >> Is this a parameter, ost.OSS.ost_io.threads_max, when set via lctl > > >> conf_parm will persist between reboots/remounts? > > >> _______________________________________________ > > >> Lustre-discuss mailing list > > >> [email protected] > > >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > > > > > > > > -- > > > Jean-François Le Fillâtre > > > Calcul Québec / Université Laval, Québec, Canada > > > [email protected] > > > > > > > > > > > -- > > David Noriega > > CSBC/CBI System Administrator > > University of Texas at San Antonio > > One UTSA Circle > > San Antonio, TX 78249 > > Office: BSE 3.114 > > Phone: 210-458-7100 > > http://www.cbi.utsa.edu > > > > Please remember to acknowledge the RCMI grant , wording should be as > > stated below:This project was supported by a grant from the National > > Institute on Minority Health and Health Disparities (G12MD007591) from > > the National Institutes of Health. Also, remember to register all > > publications with PubMed Central. > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > > > > > > > -- > > Jean-François Le Fillâtre > > Calcul Québec / Université Laval, Québec, Canada > > [email protected] > > > > _______________________________________________ > > Lustre-discuss mailing list > > [email protected] > > http://lists.lustre.org/mailman/listinfo/lustre-discuss > > Cheers, Andreas > -- > Andreas Dilger > Lustre Software Architect > Intel Corporation > > > > > > > -- Jean-François Le Fillâtre Calcul Québec / Université Laval, Québec, Canada [email protected]
_______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
