Glen Barber wrote:
Hi, MatthewOn Sun, May 24, 2009 at 3:46 AM, Matthew Seaman <m.sea...@infracaninophile.co.uk> wrote:Yuri wrote:[snip]Sure. This is not an uncommon occurrence really. The load average is the number of processes in the queue for a CPU time slice averaged over 5, 10 or 15 minutes. For multi-core systems the LA is scaled by the number of cores so a LA of 1.0 means all cores have active processes pretty much continually.I thought, if it was a dual-core for example, a load average of 1.00 would indicate 50% CPU utilization overall (1 process using only 1 core). 2.00 on a dual-core would be 100%, 3.00 on a dual-core would be 100% utilization, and always 1 process in the wait queue, and so on.
It seems both ways have been used in different OSes, which is confusing. A quick test of a single threaded process that will spin one CPU on a multi-core FreeBSD box shows the value is /not/ scaled by the number of cores. Which means that the LA the OP was talking about is actually a lot less alarming than it originally appears. It's clear from the top output that his machine has at least 8 cores, so a LA of 7 is really not very heavily loaded.
Now, you might think that an active process will take the CPU utilisation to 100%, but that is not necessarily so. Some numerical applications can do that, but purely CPU bound processes are relatively uncommon in everyday usage. In actuality what happens is that the processor will need to retrieve data from somewhere to operate on. There's a hierarchy of data stores of various speeds (latency, rather than bandwidth): L1 Cache > L2 Cache > L3 Cache > Main RAM > Disk > NetworkDoes this affect the load average though? My understanding was that if the CPU cannot immediately process data, the data gets put into the wait queue until L2 Cache (then RAM, etc, etc) returns the data to be processed.
Yes it does: when a process is on the CPU and blocked waiting for IO it does not necessarily yield the CPU to another process. It depends on timescales -- obviously if the CPU will have to wait milliseconds for data it makes no sense to block other processes. Waiting a few microseconds is a different matter though: it might take that long to load up L2/L3 cache with that processes' working data, so yielding the CPU for that sort of delay would mean the process never got run, which is counter productive... It helps if the working set is already in the L3 cache -- so having the correct amount[*] of cache RAM available is an important design criterion. It's something that Intel was shown to have got wrong with some of the Pentium series chips when a low powered Pentium M designed for mobile use smoked a much higher clock speed Pentium chip designed for all-out server use simply because it had about 4x as much cache. Cheers, Matthew [*] ie. as much as possible. -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate Kent, CT11 9PW
Description: OpenPGP digital signature