[email protected] (Shmuel Metz  , Seymour J.) writes:
> It was 16 ;-)
>
> At the time, IBM was shipping 2-way[1] MP systems. I don't know
> whether the limit was still 16 by the time MVS/XA came out. I'd be
> willing to be that both 64 and whatever number replaces it will be
> lifted internally well before hardware ships requiring larger numbers.
>
> [1] Well, the standard models at least; I'm not counting, e.g.,
>     9020, 360/67.

re:
http://www.garlic.com/~lynn/2010i.html#61 IBM to announce new MF's this year
http://www.garlic.com/~lynn/2010i.html#67 IBM to announce new MF's this year

both mvs and standard vm ... had kernel storage reworked going from
two-processor 3081 to four-processor 3084. Issue was to align all kernel
storage on cache line boundaries and make them multiples of cache line
in length. Issue was that if two different storage locations overlapped
in the same cache line ... then two different processors could be
accessing the different storage areas ... but in the same cache line
... resulting in cache "thrashing" between the two caches. Going from
two-way to four-way ... increased the problem by factor of three
(instead of contention from one other cache ... there was contention
from three other caches).

traditional 370 two-way ... would slow the processor cycle down by 10%
to accomodate simple cache invalidation single traffic between two
caches (any actual cache invalidations and/or cache thrashing would
futher degrade thruput). So base two-way hardware was 1.8times that of a
single processor. Typical operating system smp overhead would further
reduce that to 1.3 to 1.5 times the thruput of single processor.

I did some two-way SMP and had some cases of greater than two times
thruput of single processor ... with some tricks for maintaining cache
locality that improved cache hit ratios (along with very low smp
coordination overhead). part was heavy use of compare&swap for
concurrent execution with minimal use of locking of critical sections
that would result in serialized operation (critical section locking
impacts thruput increases as processors are added ... since more
processors raise probability that processors will be in contention for
same critical section).

originally 3081 wasn't going to have uniprocessor ... but in part
because of lack of TPF (aka ACP) having smp support ... eventually they
came out with 3083 which allowed removing the processor slow-down for
cross-cache invalidation ... aka base 3083 started out have processor
cycle nearly 15% faster (w/o the 10% slowdown). There was issue with
default removing processor-1 in the middle of 3081 frame ... which left
processor-0 at the top and the box top-heavy. Then there was special
3083 with additional customization of processor microcode for TPF
operation.

recent mention of 3083:
http://www.garlic.com/~lynn/2010.html#21 Happy DEC-10 Day
http://www.garlic.com/~lynn/2010d.html#14 Happy DEC-10 Day
http://www.garlic.com/~lynn/2010d.html#79 LPARs: More or Less?
http://www.garlic.com/~lynn/2010e.html#23 Item on TPF
http://www.garlic.com/~lynn/2010i.html#24 Program Work Method Question

original 360/67 announcement was for four-way ... but i don't know of
any four-way built ... and i think there was only a couple of three-ways
built. single processor was pretty much 360/65 with additional of
associative array for virtual address translation. multiprocessor 360/67
had lot more differences from 360/65; multi-ported memory, "channel
controller" (able to address all channels from all processors).

-- 
42yrs virtualization experience (since Jan68), online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to