[email protected] (Shmuel Metz , Seymour J.) writes: > If the count is exactly 258 the the loop is a single MOV with a repeat > prefix. You need additional instructions if you don't know the length > to be a multiple of 8. > > But your point remains valid, and is stronger if you look at the long > compare, convert, move and translate instructions on z.
re: http://www.garlic.com/~lynn/2013l.html#50 Mainframe on Cloud http://www.garlic.com/~lynn/2013l.html#51 Mainframe on Cloud http://www.garlic.com/~lynn/2013l.html#53 Mainframe on Cloud besides dhrystone mips is number of dhrystone iterations compared to number performed by baseline 370/158 calibrated as 1 MIPS .... there were old comparisons of eggregiously heavy-weight processing by VTAM. at the time open-system were trying to reduce TCP/IP stack pathlength from 5k instructions and 5 buffer copies ... even working on direct I/O from the application w/o buffer copy ... the somewhat equivalent VTAM LU6.2 processing had 150k instruction pathlength and 16 buffer copies. for 4k to 8k data, the buffer copies would actually take more processor cycles than the actual instructions ... and result in lots of unnecessary cache pollution (why some platforms have implemented cache bypass data moves). the original mainframe TCP/IP stack product was done in VS/Pascal outside VTAM. For various reasons the implementation had poor performance getting about 44kbytes/sec throughput using nearly full 3090 processor. I did the modifications to the product for RFC1044 support and in some tuning tests at cray research between cray and 4341 ... got sustained 4341 channel throughput using only modest amount of 4341 processor (possibly 500 times improvement in number of bytes moved per instruction executed). past posts mentioning doing RFC1044 support http://www.garlic.com/~lynn/subnetwork.html#1044 later the communication group contracted for a tcp/ip stack implementation inside VTAM ... the contractor initially demonstrated TCP throughput much higher than LU6.2. The communication group then told the contractor that everybody *knows* that a *correct* tcp/ip implementation is much slower than LU6.2 ... and they would only be *paying* for a *correct* implementation. separate from pathlength there is issue of I/O. In 1980, I got con'ed into doing channel extender support for STL ... it was bursting at the seams and they were moving 300 people from the IMS group to an offsite building (unrelated to Gray con'ing me in to doing DBMS consulting for IMS group). They had tried remote 3270 support but found the human factors totally unacceptable. The channel extender support allowed putting local channel attach 3270 controllers at the remote site connected back to mainframes in the STL datacenter. Part of the support was downloading channel programs to the channel emulator at the remote site ... eliminating significant channel program protocol chatter latency back&forth between the mainframe and the remote site. The vendor tried to get IBM approval to ship my channel extender support ... but there was a group in POK playing with some serial technology, that managed to get the approval turned down ... they were apparently afraid that having that support in the market, it might make it more difficult for them shipping their technology. In 1988, I was asked to help LLNL standarize some serial technology they had. This morphs into fibre channel standard (FCS) ... including support for I/O programs at remote ends. Then in 1990, the POK group gets their serial technology shipped as ESCON with ES/9000 ... at which time it was already obsolete. Some POK channel engineers become involve in FCS and define an extremely heavyweight layer that drastically cuts the throughput of the native FCS ... which eventually ships as FICON. some past posts mentioning FICON http://www.garlic.com/~lynn/submisc.html#ficon Recent z196 "peak" I/O IBM benchmark used 104 FICONs and 14 SAPs to achieve 2M IOPS. By comparison, a recent FCS announced for e5-2600 claimed over million IOPS (i.e. two such e5-2600 FCS would have higher throughput than "peak" z196 with 104 FCS that have FICON layered on top). TCW enhancements to FICON appear to be similar to what I did over 30yrs ago in 1980 for channel-extender and minimizing the associated channel program chatter latency ... but appears to only slightly narrow the throughput gap between FICON layer and native FCS. IBM numbers also has peak z196 throughput of 2.2M SSCHs with all SAPs running at 100% busy ... however, IBM recommendations is SAPs be limited to 70% busy ... or 1.5M SSCHs. -- virtualization experience starting Jan1968, online at home since Mar1970 ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
