The following message is a courtesy copy of an article that has been posted to bit.listserv.ibm-main,alt.folklore.computers as well.
[EMAIL PROTECTED] (Paul Gilmartin) writes: > Of course, there would be much wailing and gnashing of teeth from > customers discovering that code compiled to the bare metal far > outperformed their beloved Assembler programs on the emulated > hardware. The vendor would need to cover its collective ears and > encourage the customers to migrate to better technology. re: http://www.garlic.com/~lynn/2008d.html#47 Linux zSeries questions http://www.garlic.com/~lynn/2008d.html#49 Linux zSeries questions this is somewhat 30yrs gone ... some number of this has all been encountered before ... recent thread in comp.arch mentioning 360/370 vertical microcode microprocessors having 10:1 execution ... i.e. avg. of 10 microcode instructions executed for every 360/370 instructions. this gave rise to "ECPS" on 138/148 & 43xx machines ... moving kernel code into microcode getting 10:1 performance improvement: http://www.garlic.com/~lynn/2008d.html#39 Throwaway cores http://www.garlic.com/~lynn/2008d.html#46 Throwaway cores http://www.garlic.com/~lynn/2008d.html#52 Throwaway cores http://www.garlic.com/~lynn/2008d.html#54 Throwaway cores http://www.garlic.com/~lynn/2008d.html#56 Throwaway cores however, this was the low-end and mid-range 370s ... using vertical microcode microprocessors ... and wasn't true for the high-end machines using horizontal microcode microprocessors. as also mentioned in the comp.arch thread ... the wide proliferation of different (vertical microcode) microprocessors (systems, controllers, channels, etc) resulted in projects circa 1980 to move corporation to single 801/risc microprocessor architecture (iliad chips). However, for various reasons this effort floundered. As mentioned, the 4341-followon (4381) started out being one of these 801/risc processors ... and I contributed to the writeups killing that strategy. the issue was that chips were getting complex enuf that it was starting to be possible to implement the 370 instructions directly in circuits ... rather than having intermediate microcode level. now, the high-end product line was using horizontal microcode microprocessors ... this required extremely complex programming ... since different fields in the same instruction controlled different functions ... like starting a data move from one unit to another unit ... overlapped with variety of other functions. the programmer then had to manually count machine cycles (instructions) before the data could be expected to have finished the moved. because of the overlap complexity, these machines measured performance in the avg. machine cycles per 370 instruction (rather than avg. number of microcode instructions executed per 370 instruction). 370/165 was measured in an avg. of 2.1 machine cycles per 370 instruciton. this was optimized for 370/168 for an avg. of 1.6 machine cycles per 370 instruction. For 3033 it was approx. one machine cycle per 370 instruction. this resulted in various problems .... ECPS virtual machine microcode assist on 148 & 4341 (i.e. moving part of the kernel instructions into microcode) got a 10:1 performance improvement. However, an attempt to do something similar on 3033 actually resulted in slight performance degradation (there was no gain doing a one-for-one translation from 370 instruction to 3033 native). as in the discussion regarding virtual machine microcode assist: http://www.garlic.com/~lynn/94.html#21 370 ECPS VM microcode assist http://www.garlic.com/~lynn/94.html#27 370 ECPS VM microcode assist http://www.garlic.com/~lynn/94.html#28 370 ECPS VM microcode assist there was one set of (ECPS) things that just did a 1:1 kernel 370 instruction into microcode (for 10:1 performance improvement). there were another set of things that directly executed privilege instruction (but according to virtual machine rules) w/o interrupting into kernel. This later set of things showed performance improvements across all hardware implementations ... since it eliminated interrupts into the kernel, state change overhead, register save/restore overhead, etc (aka eliminated virtual machine kernel execution at all ... as opposed to trying to make the kernel execution run faster). this shows up in amdahl hypervisor, 3090 pr/sm and current day LPARs. in this day & age, the place where ECPS approach might be useful would be the intel platform 370 simulators (ala hercules implementation) ... where there is (again) the equivalent of vertical microcode implementing 370 instructions. the slight caveat in all this is 370 architecture allowing self-modifying instructions ... supposedly half the cycles in many (earlier) hardware implementations, involved double checking whether the previous instruction has modified the current instruction (impacting instruction execution thruput). current generations of chip technologies (not just mainframe chips) have significantly more complex implementations and enormous amounts of spare circuits. general chip technologies have implemented pipelined speculative execution ... associated with pipelining and not being able to tell for sure which branch path might be taken. speculative execution includes being able to undo instruction execution path (branch actually went the other way) ... something similar to speculative execution can be used to pipeline execution and then undo the operation if there has been self-modifying code. recent post referencing pipelining and speculative execution for compensating for memory latencies &/or cache misses: http://www.garlic.com/~lynn/2008c.html#92 CPU time differences for the same job but hypothetically, technology could also be used as mechanism for dealing with self-modifying instruction streams. ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [EMAIL PROTECTED] with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html

