* Benny Lvfgren <[email protected]> [2011-01-07 20:45]: > On 2011-01-07 19.54, Ted Unangst wrote: > >>experiment with parallel ports building on a 64-way sparc64 T2 went. > >>With 32 build jobs it looked like this: > >><landry_p22> 0.8%Int 48.9%Sys 6.0%Usr 0.0%Nic 44.3%Idle > >><landry_p22> around that all the time > >My understanding is that the T2 is closer to an 8-way machine. If we > >could recognize the real cores and balance appropriately, 8 build jobs > >shouldn't be too bad. > >At least with a 4-core 8-thread i7 processor, make -j 8 scales reasonably > >well. > > Just to illustrate, a quick test on my 8-core (2 cpu x 4 core) > Supermicro AMD box (compile a GENERIC.MP kernel): > > # make clean && make depend > # time make > ... > 3m26.78s real 2m43.73s user 0m35.08s system > > # make clean && make depend > # time make -j8 > ... > 0m47.40s real 2m52.75s user 3m1.70s system > > On a first glance it doesn't scale all that well, about 4,4 times > quicker real time when running eight compiler tasks simultaneously > compared to the single one. > > But the server isn't idle to begin with (it is run in quite heavy > production), and this sort of test is of course not processor-only. > Also, both tests were run with the MP kernel, so even the > single-task test would probably utilize several kernels at times.
indeed - your test has some flaws. but still, the scaling it shows isn't all that bad - and keep in mind that cores typically share a bit more than seperate CPUs. this can have advantages or disadvantages. the box i have in mind does two things that matter for this discussion: -takes backups for/from many servers -does dns & webalizer on webserver logfiles (many many, from many webservers) the backup sounds I/O-heavy - and of course kinda is. but the biggest load is gzip. the backup stuff i wrote myself over many years, it has a nifty scheduler that parallelizes nicely. the webserver logfile processing suffers from dns latency (local cache of course, but still). massive massive massive parallel processing (i wrote that stuff, too) drives it to a point where all CPUs are almost 100% busy (well, see below). the backup runs for about 3 hours with all CPUs busy. the webserver logfile thing usually like 2 hours, but only one hour with everything busy, afterwards only the big logs are still being processed and the latency is the limiting factor. the box used to be a dual xeon 2.2 (the older, p4-based heating plate), with hyperthreading, so 4 logical CPUs with ami RAID 5. the backup scales almost perfect, more than 3.5x faster with the 4 logical CPUs vs just one. webserver log processing gives the same picture. since wednesday it is an intel E7500, 2.93GHz, 2 cores, a sata disk to boot from and two big sata disks, softraid raid 1. it is slightly faster than the previous one. pls note that i can only give estimates, since backup and webserver log processing performance are influenced by external factors. and since somebody is going to ask - the seperate boot disk (that holds OS and everything, just not the raw data) is there to make it easy to replace the data disks by bigger ones. so for these tasks, we scale perfectly fine. throwing more than one cpu (core) at a database server running just one mysqld instance is not going to help right now. that's likely to change with rthreads so. throwing more than one core at a firewall (without much proxy stuff in userland) hurts more than it helps right now. guess my point is clear. we scale fine for many (I'd even say the most) tasks. we scale miserably for some others. yes, our SMP can be improved, but it isn't bad. heck, what cannot be improved? -- Henning Brauer, [email protected], [email protected] BS Web Services, http://bsws.de Full-Service ISP - Secure Hosting, Mail and DNS Services Dedicated Servers, Rootservers, Application Hosting

