The following message is a courtesy copy of an article
that has been posted to bit.listserv.ibm-main,alt.folklore.computers as well.


[EMAIL PROTECTED] (Paul Gilmartin) writes:
> Of course, there would be much wailing and gnashing of teeth from
> customers discovering that code compiled to the bare metal far
> outperformed their beloved Assembler programs on the emulated
> hardware.  The vendor would need to cover its collective ears and
> encourage the customers to migrate to better technology.

re:
http://www.garlic.com/~lynn/2008d.html#47 Linux zSeries questions
http://www.garlic.com/~lynn/2008d.html#49 Linux zSeries questions

this is somewhat 30yrs gone ...

some number of this has all been encountered before ... recent thread in
comp.arch mentioning 360/370 vertical microcode microprocessors having
10:1 execution ... i.e. avg. of 10 microcode instructions executed for
every 360/370 instructions. this gave rise to "ECPS" on 138/148 &
43xx machines ... moving kernel code into microcode getting 10:1
performance improvement:
http://www.garlic.com/~lynn/2008d.html#39 Throwaway cores
http://www.garlic.com/~lynn/2008d.html#46 Throwaway cores
http://www.garlic.com/~lynn/2008d.html#52 Throwaway cores
http://www.garlic.com/~lynn/2008d.html#54 Throwaway cores
http://www.garlic.com/~lynn/2008d.html#56 Throwaway cores

however, this was the low-end and mid-range 370s ... using vertical
microcode microprocessors ... and wasn't true for the high-end machines
using horizontal microcode microprocessors.

as also mentioned in the comp.arch thread ... the wide proliferation of
different (vertical microcode) microprocessors (systems, controllers,
channels, etc) resulted in projects circa 1980 to move corporation to
single 801/risc microprocessor architecture (iliad chips). However, for
various reasons this effort floundered. As mentioned, the 4341-followon
(4381) started out being one of these 801/risc processors ... and I
contributed to the writeups killing that strategy. the issue was that
chips were getting complex enuf that it was starting to be possible to
implement the 370 instructions directly in circuits ... rather than
having intermediate microcode level.

now, the high-end product line was using horizontal microcode
microprocessors ... this required extremely complex programming
... since different fields in the same instruction controlled different
functions ... like starting a data move from one unit to another unit
... overlapped with variety of other functions. the programmer then had
to manually count machine cycles (instructions) before the data could be
expected to have finished the moved. because of the overlap complexity,
these machines measured performance in the avg. machine cycles per 370
instruction (rather than avg. number of microcode instructions executed
per 370 instruction). 370/165 was measured in an avg. of 2.1 machine
cycles per 370 instruciton. this was optimized for 370/168 for an
avg. of 1.6 machine cycles per 370 instruction. For 3033 it was approx.
one machine cycle per 370 instruction.

this resulted in various problems .... ECPS virtual machine microcode
assist on 148 & 4341 (i.e. moving part of the kernel instructions into
microcode) got a 10:1 performance improvement. However, an attempt to do
something similar on 3033 actually resulted in slight performance
degradation (there was no gain doing a one-for-one translation from 370
instruction to 3033 native).

as in the discussion regarding virtual machine microcode assist:
http://www.garlic.com/~lynn/94.html#21 370 ECPS VM microcode assist
http://www.garlic.com/~lynn/94.html#27 370 ECPS VM microcode assist
http://www.garlic.com/~lynn/94.html#28 370 ECPS VM microcode assist

there was one set of (ECPS) things that just did a 1:1 kernel 370
instruction into microcode (for 10:1 performance improvement). there
were another set of things that directly executed privilege instruction
(but according to virtual machine rules) w/o interrupting into
kernel. This later set of things showed performance improvements across
all hardware implementations ... since it eliminated interrupts into the
kernel, state change overhead, register save/restore overhead, etc (aka
eliminated virtual machine kernel execution at all ... as opposed to
trying to make the kernel execution run faster). this shows up in amdahl
hypervisor, 3090 pr/sm and current day LPARs.

in this day & age, the place where ECPS approach might be useful would
be the intel platform 370 simulators (ala hercules implementation)
... where there is (again) the equivalent of vertical microcode
implementing 370 instructions.

the slight caveat in all this is 370 architecture allowing
self-modifying instructions ... supposedly half the cycles in many
(earlier) hardware implementations, involved double checking whether the
previous instruction has modified the current instruction (impacting
instruction execution thruput).

current generations of chip technologies (not just mainframe chips) have
significantly more complex implementations and enormous amounts of spare
circuits. general chip technologies have implemented pipelined
speculative execution ... associated with pipelining and not being able
to tell for sure which branch path might be taken. speculative execution
includes being able to undo instruction execution path (branch actually
went the other way) ... something similar to speculative execution can
be used to pipeline execution and then undo the operation if there has
been self-modifying code.

recent post referencing pipelining and speculative execution for
compensating for memory latencies &/or cache misses:
http://www.garlic.com/~lynn/2008c.html#92 CPU time differences for the same job

but hypothetically, technology could also be used as mechanism for
dealing with self-modifying instruction streams.

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to