Re: A Christmassy PL/I tale

Patrick Vogt Mon, 26 Dec 2016 03:00:01 -0800

Hi Prino

>Let's assume you gate-crash an IBM GSE meeting, way back in 2010 and talk to 
>an 
>IBM developer about optimizing code, and he tells you that IBM is well ahead 
>(by 
>"at least" five years) of the Open Source community!


If you look at coding/automation on decentralized platforms, the Mainframe is 
technically still ahead of the other platforms. It may be slower on 
implementations but that's not to do with the Mainframe but of people coding 
like 20 years ago and structures of the companies.

>..
>Here is the output of the compiler, and lets start with the code generated by 
>the old OS compiler, and for what it's worth, w_hh and w_mm are defined 
>externally as "fixed dec (3)"

for this tests, please post the complete/cropped Program. It's easier to copy 
and paste and then run the compiler.

old:
>..01DA3A  D2 07 D 0B8 D 09B  MVC   S(8),WKSP.78+35
>
>L   :  3
>ZAP : 18
>MP  : 10
>MVC : 23
>DP  :  4
>MVN :  8
>NI  :  4
>XC  :  4
>XI  :  4
>SP  :  4
>      ---
>       82 instructions, 470 bytes of code

new:
>
>First thing that's noticable is that the code has become totally unreadable to 
>those with only a little knowledge of z/OS assembly language, everything is 
>#pdnnnnn and variable names are mostly gone and, hey, it looks like it's a bit 
>longer, so lets count:
>
>ZAP : 15
>MP  : 10
>MVC : 52
>DP  :  4
>MVN :  8
>AHI :  1
>XC  : 22
>TM  :  2
>SP  :  4
>NC  :  4
>SRP :  4
>JE  :  2
>     ---
>      128 instructions, 758 bytes of code

You alway's need good Assembler knowhow to understand at this level what the 
compiler does. It's easier to read if you disable all checks and optimizations.

So for performance, please don't count the instructions but the bytes moved by 
the instructions and the place (Register, Storage, Cache) they are in. The 
Instructions are in the instruction cache and if you don't destroy the 
instruction cache, it's extremly fast.

As i have made some tests some time ago, here is my list (not complete) of 
fastest instruction types, fastest at top, example in brackets:
-Immediate instructions (LHI)
-Register to Register (LR)
-Memory (read) to Register (L)
-Memory(write) from Register (ST)
-Memory to Memory (MVC)

i don't know why, but at my tests, Immediate Instructions was faster than 
Register to Register... it may be wrong..
For Memory operations, it depend on the Datacache (Level x) it's in.

>
>And the count, or rather the act of counting, immediately raises five very, 
>very 
>significant questions:
>
>1) What is the performance of the old code on a new z13 system compared to the 
>code that is now emitted?

Don't know but needs to much time to test. :)

>2) There are still 10 MP instructions, and inspection of the code, which was 
>compiled ARCH(10) OPT(3) (in other words, as optimal as possible on the 
>hardware 
>I have access to), reveals that Enterprise PL/I AD 2014 still doesn't seem to 
>know anything about common sub-expression elimination.

ok

>2) Why the abso-eff-ing hell (sorry is these words cause some offence) are 
>there 
>two JE instructions in this code, as any conditional jump has the ability of 
>causing significant stalls due to breaking the pipeline? We are TRUNCATING, 
>not 
>rounding!

don't know

>4) Why the flucking 'ell does the added multiply of "n * 3600" take only three 
>instructions using the OS compiler, but no less than 7 (seven, SEVEN, 
>S*E*V*E*N!) using Enterprise PL/I. And the "n * 60" multiply is even worse, 
>two 
>versus eight!

it's only five if you set some compiler options (disabling check's like 
RTCHECK):
L        r1,_addrN(,r13,204)                            
ZAP      #pdr11@546_2(6,r13,546),_shadow1(3,r1,0)       
MP       #pdr11@546_2(6,r13,546),+CONSTANT_AREA(3,r6,35)
ZAP      #pdr9@358_2(5,r13,358),#pdr11@546_2(6,r13,546) 
MVC      D(5,r13,184),#pdr9@358_2(r13,358)              

>5) Any compiler worth the adjective "Optimizing" knows about the underlying 
>hardware, and this is what the POP tells me about the DP instruction:

The Optimizer is not always better. It must be like a chess-computer. 
Sometimes, OPT(2) gives better code than OPT(3).
If you want to look at the compiler itself, try OPT(0).

I think, the code is better now. It's bigger and maybe slower but have some 
goodies that the old compiler did not had.
Also today's machines are "out of order" machines so even the hardware does 
optimizations.

>..
>IBM why you are paying a large amount of money for a compiler that generates 
>code that may well significantly negate the increased speed your latest shiny 
>z13 system is supposed to deliver.

I think, the PL/I compiler is still the best of all. I write everything in PL/I 
and for some exceptions Assembler.
exceptions:
-High performant code (called Millions to Millards of times per day).
-System interfaces that have some special API's or using Mapping Macros.

PL/I has implemented the new instructions of z13 (sadly, we can't use because 
our fallbackmachine is a zEC12).
We have now with the z13 alot of Vector Registers, i would use it as cache of 
the Registers as we alway's have to less Registers (beware of flot and vector 
Instructions and call's).

------

But !
The whole is not that important as you may think (but it's interesting).
-Costs of the MIPS are significant lower than even some years ago.
-To make a difference in costs, this code have to run millions of times per 
day, every day.
-Whatever you code in PL/I, today you (mostly) don't need to look at 
performance, even very bad code is today as fast as hell.
-If you do DB2, everything in PL/I looks extermly fast.
-If you do Webservice, everything with DB2 looks extremly fast.

Today our problem's are (importantest at top):
-Maintenance. You have to code so, that anyone can continue on the program. We 
have rules and the PL/I compiler check's them (included by alot of RFE's, for 
example 
https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=88020).
-Multiple Megabytes transferred by applications (IMS / CICS).
-sometimes, array's are full initialized (Megabytes) but our rules are, that 
you don't do initializations of arrays but set a fill variable. 
 
Also i think, PL/I has the best support. Almost every (reasonable) RFE is 
implemented in the next PL/I release, sometimes even back to older PL/I 
releases. You first have to search for anything similar... 

>Merry Christmas,
>
>Robert

to you too,
Patrick

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: A Christmassy PL/I tale

Reply via email to