[email protected] (Shmuel Metz  , Seymour J.) writes:
> If the count is exactly 258 the the loop is a single MOV with a repeat
> prefix. You need additional instructions if you don't know the length
> to be a multiple of 8.
>
> But your point remains valid, and is stronger if you look at the long
> compare, convert, move and translate instructions on z.

re:
http://www.garlic.com/~lynn/2013l.html#50 Mainframe on Cloud
http://www.garlic.com/~lynn/2013l.html#51 Mainframe on Cloud
http://www.garlic.com/~lynn/2013l.html#53 Mainframe on Cloud

besides dhrystone mips is number of dhrystone iterations compared to
number performed by baseline 370/158 calibrated as 1 MIPS ....  there
were old comparisons of eggregiously heavy-weight processing by VTAM.

at the time open-system were trying to reduce TCP/IP stack pathlength
from 5k instructions and 5 buffer copies ... even working on direct I/O
from the application w/o buffer copy ... the somewhat equivalent VTAM
LU6.2 processing had 150k instruction pathlength and 16 buffer copies.
for 4k to 8k data, the buffer copies would actually take more processor
cycles than the actual instructions ... and result in lots of
unnecessary cache pollution (why some platforms have implemented cache
bypass data moves).

the original mainframe TCP/IP stack product was done in VS/Pascal
outside VTAM. For various reasons the implementation had poor
performance getting about 44kbytes/sec throughput using nearly full 3090
processor. I did the modifications to the product for RFC1044 support
and in some tuning tests at cray research between cray and 4341 ... got
sustained 4341 channel throughput using only modest amount of 4341
processor (possibly 500 times improvement in number of bytes moved per
instruction executed). past posts mentioning doing RFC1044 support
http://www.garlic.com/~lynn/subnetwork.html#1044

later the communication group contracted for a tcp/ip stack
implementation inside VTAM ... the contractor initially demonstrated TCP
throughput much higher than LU6.2. The communication group then told the
contractor that everybody *knows* that a *correct* tcp/ip implementation
is much slower than LU6.2 ... and they would only be *paying* for a
*correct* implementation.

separate from pathlength there is issue of I/O. In 1980, I got con'ed
into doing channel extender support for STL ... it was bursting at the
seams and they were moving 300 people from the IMS group to an offsite
building (unrelated to Gray con'ing me in to doing DBMS consulting for
IMS group). They had tried remote 3270 support but found the human
factors totally unacceptable. The channel extender support allowed
putting local channel attach 3270 controllers at the remote site
connected back to mainframes in the STL datacenter. Part of the support
was downloading channel programs to the channel emulator at the remote
site ... eliminating significant channel program protocol chatter
latency back&forth between the mainframe and the remote site.

The vendor tried to get IBM approval to ship my channel extender support
... but there was a group in POK playing with some serial technology,
that managed to get the approval turned down ... they were apparently
afraid that having that support in the market, it might make it more
difficult for them shipping their technology.

In 1988, I was asked to help LLNL standarize some serial technology they
had. This morphs into fibre channel standard (FCS) ... including support
for I/O programs at remote ends. Then in 1990, the POK group gets their
serial technology shipped as ESCON with ES/9000 ... at which time it was
already obsolete.

Some POK channel engineers become involve in FCS and define an extremely
heavyweight layer that drastically cuts the throughput of the native FCS
... which eventually ships as FICON. some past posts mentioning FICON
http://www.garlic.com/~lynn/submisc.html#ficon

Recent z196 "peak" I/O IBM benchmark used 104 FICONs and 14 SAPs to
achieve 2M IOPS. By comparison, a recent FCS announced for e5-2600
claimed over million IOPS (i.e. two such e5-2600 FCS would have higher
throughput than "peak" z196 with 104 FCS that have FICON layered on top).

TCW enhancements to FICON appear to be similar to what I did over 30yrs
ago in 1980 for channel-extender and minimizing the associated channel
program chatter latency ... but appears to only slightly narrow the
throughput gap between FICON layer and native FCS.

IBM numbers also has peak z196 throughput of 2.2M SSCHs with all SAPs
running at 100% busy ... however, IBM recommendations is SAPs be limited
to 70% busy ... or 1.5M SSCHs.

-- 
virtualization experience starting Jan1968, online at home since Mar1970

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to