re: http://www.garlic.com/~lynn/2014c.html#62 Optimization, CPU time, and related issues
aka the internal operation of the machine ... and the execution elements actually being managed ... are becoming less & less directly related to the external instruction architecture. for instance, risk architectures have had significant performance advantage over i86 (having pioneered super scalar, out-of-order execution, branch prediction, speculative execution, etc) ... however for the last several generations of server chips ... i86 has gone to hardware layer that translates i86 instructions into risk micro-ops for execution ... which has largely mitigated the difference in throughput between risk and i86. the more sophisticated compilers will include some level of model of the internal execution characteristics as part of code generation. another feature common in i86 has been hypertheading ... in the 70s, I got sucked into a proposal to do hyperthreading for the 370/195 (that never shipped) ... basically feading the execution units from two separate (simulated multiprocessor) i-streams. The issue was that 370/195 was out-of-order, superscaler, and pipelined ... but conditional operations stalled the processing (no branch prediction or speculative execution). The issue was that peak 370/195 was around 10mips ... but tended to require very careful coding ... most codes with conditional branching only ran around 5mips. the idea was that two i-streams, each running around 5mips throughput (because of conditional branch processing stalling the machine) ... it would achieve 10mips aggregate throughput. 360/91, 360/195, 370/195 discussed here http://www.quadibloc.com/comp/pan05.htm the above talks about cycle time of 91, 95, & 195 ... basically the same 750ns memory used in the 65 & 75. Originally the 360/60 and 360/70 was going to have 1ms memory ... but it was upgraded to 750ns ... and the model numbers changed. 65/(67) & 75 did double word fetch at a time ... for the i-stream it kept the full 8-bytes around ... so it didn't require a separate memory fetch for every instructions. the timing values for the machines include instruction execution and other data/store fetch memory times plus a prorated amount for instruction fetch (assuming execution normally proceeds sequentially) ... aka a 2byte instruction includes 1/4th of 750ns instruction fetch, a 4byte instruction includes 1/2th of 750ns instruction fetch, a 6byte instsruction includes 3/4th of 750ns instruction fetch. the hypertheading gimick had been proposed in the ACS-360 effort http://people.cs.clemson.edu/~mark/acs_end.html see "Sidebar: Multithreading" in above ... which is followed by another sidebar about acs-360 features that finally show up 20yrs later in es/9000. Earlier in the article Amdahl talks about IBM executives shutting down the effort because it would advance the computing state-of-the-art too fast and they would loose control of the market. -- virtualization experience starting Jan1968, online at home since Mar1970 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@listserv.ua.edu with the message: INFO IBM-MAIN