Re: A New Performance Model ?

Anne & Lynn Wheeler Thu, 09 Apr 2015 09:38:33 -0700

sipp...@sg.ibm.com (Timothy Sipples) writes:
> Storage isn't what it was in 1982, and that's the whole point. It's faster,
> more reliable, and ridiculously less expensive. We shift our attentions
> elsewhere, rightly so, at least in terms of degree of emphasis. We simply
> don't worry about kilobytes if we're rational. This year we worry about
> terabytes, and maybe in the future we won't even worry about those.


re:
http://www.garlic.com/~lynn/2015c.html#65 A New Performance Model ?

I've periodically mentioned that when measured in number of CPU cycles
access to storage (aka a cache miss) is similar to 60s access to disk
when measured in 60s CPU cycles (caches are the new storage and storage
is the new disk).

for decades other processors (especially risc and then i86 when they
moved to risc cores with hardware layer that translated from i86 to risc
microops), have had lots of hardware features that attempt to
mitigage/compensate for (cache miss) storage access latency;
hyperthreading, out-of-order execution, branch prediction and
speculative execution.

the claim is that at least half the z10->z196 per processor throughput
improvement is starting to introduce similar features ... with further
refinements moving to z12 & z13.

go back over 40 years, this shows up in 195. I've periodically mentioned
getting con'ed into helping with effort to add hyperthreading to 370/195
... which never announced/shipped. The issue was that 195 pipeline had
out-of-order execution but didn't have branch prediction or speculative
execution ... so conditional branches drained the pipeline. It took
careful programming to get sustained 10MIPs throughput ... but most
codes (with conditional branches) ran at 5MIPs. The objective with
hyperthreading was to emulate two-processor multiprocessor hoping that
two instruction streams running at 5MIPs each would archieve 10MIPs
throughput.

it was basically red/blue mentioned in this 60s ACS/END reference
http://people.cs.clemson.edu/~mark/acs_end.html

Note that the above also points out that ACS-360 was shutdown because
executives thought that it would advance the state-of-the-art to fast
and they would loose control of the market. It lists some of the ACS-360
features that show up more than 20yrs later with ES/9000.

The equivalent to 195 pipeline careful programming ... is careful code
ordering to minimize cache misses (in much the same way that 70s/80s
code was ordered to minimize page faults ... requiring disk accesses).
Recent discussion in comp.arch about (virtual memory and) VS/Repack out
of the science center in the 70s ... which did semi-automated code
reorganization for virtual memory operation. Before it was released to
customers, many internal development groups had been using it for
improving operation for virtual memory environment; they also used some
of the VS/Repack technology for "hot-spot" analysis.
http://www.garlic.com/~lynn/2015c.html#66 Messing Up the System/360

aka part of the decision to migrate all 370s to virtual memory.  Old
post that the primary motivation for this was analysis that because MVT
storage management was so bad ... that regions had to be four times
larger than normally used ... a typical 1mbyte storage 370/165 ran with
four regions. With virtual memory, it would be possible to run with 16
regions and still result in little or no paging.
http://www.garlic.com/~lynn/2011d.html#73 Multiple Virtual Memory


topic drift ... what 370/195 didn't account for was that MVT/SVS/MVS in
the period introduced extraordinarily inefficient multiprocessor
overhead, typical guidelines was two processor operation was 1.3-1.5 the
throughput of single processor.

this brings up the story about compare&swap ... invented by charlie at
the science center when he was doing work on fine-grain (efficient)
multiprocessor locking for (virtual machine) cp/67. initial attempt to
have it included in 370 was rejected ... the 370 architecture owners
said that the POK favorite son operating system people were claiming
that test&set was more than sufficient for multiprocessor support
(partially accounting for their being able to only get 1.3times the
throughput). cp67 (& later vm370) multiprocessor support could get close
to multiprocessor hardware throughput (with minimal introduced
multiprocessor operating system overhead). we ere finally able to
justify compare&swap for 370 with the examples of how multithreaded
applications could use compare&swap (regardless of single processor
or multi-processor operation) ... examples that continue to be
included in POO. past multiprocessor &/or compare&swap posts
http://www.garlic.com/~lynn/subtopic.html#smp
past science center posts
http://www.garlic.com/~lynn/subtopic.html#545tech


-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN

Re: A New Performance Model ?

Reply via email to