The shared L2 reduces the penalty for those situations when you can't avoid dispatching on a new engine. That is when the system is very busy. This is one of the reasons for the difference in utilization. As the machine gets busier other machines are forced into L2-L2 or remote L3-localL1 (victim cache) transfers which have a high penalty. In z the migration is from shared L2 to L1. The less affinity scheduling delays dispatching, the more the system behaves like a multiple server single queue system, which is the optimum case. The more scheduling delays dispatching the more the system behaves like multiple single server single queue systems, which will not perform well if the load has skew or high variability. Thus if the affinities are hardened (often done in skewless benchmark runs) skew will cause some cpus to overload while others are idle. If there is no affinity then there are more cache migrations. In between the there is a combination of the first and second cases and it is a matter of what the migration penalty is v the queueing penalty for affinity scheduling. Of course this is yet another reason that relative capacity is workload dependent.
Another aspect of z's common L2 is that it always holds a copy of the data in the L1s attached to it and therefore snooping is avoided. High end System x systems (X460 class) do this by keeping a shadow directory that covers the on chip caches. Joe Temple Distinguished Engineer Sr. Certified IT Specialist [EMAIL PROTECTED] 845-435-6301 295/6301 cell 914-706-5211 Home office 845-338-1448 Home 845-338-8794 Alan Altmark/Endicott/ [EMAIL PROTECTED] To Sent by: Linux on LINUX-390@VM.MARIST.EDU 390 Port cc <[EMAIL PROTECTED] IST.EDU> Subject Re: Who's been reading our list... 05/18/2006 08:55 AM Please respond to Linux on 390 Port <[EMAIL PROTECTED] IST.EDU> On Thursday, 05/18/2006 at 10:03 ZE2, Martin Schwidefsky <[EMAIL PROTECTED]> wrote: > The cache is a different story. Mainframes have the advantage of a > shared level 2 cache compared to x86. If a process migrates from one > processor to another, the cache lines of the process just have to be > loaded from level 2 cache to level 1 cache again before they can be > accessed. On x86 it goes over memory. The cache designs on the mainframe change from generation to generation to deal with more work, changes in the relationship of CPU speed to memory speed, and more CPUs. You want the benefits of cache, but you want to minimize the serialization/synchronization effects on the processors. This is why we do our best to dispatch a virtual machine on the same CPU as was used in the previous time slice. The relationship between the CPUs and a particular cache is not always equal, but is always the best if you use the same CPU again. Alan Altmark z/VM Development IBM Endicott ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390