Re: A Christmassy PL/I tale

Robert Prins Tue, 27 Dec 2016 02:49:35 -0800

Patrick,

On 2016-12-26 10:58, Patrick Vogt wrote:
> Hi Prino
>
>> Let's assume you gate-crash an IBM GSE meeting, way back in 2010 and talk
>> to an IBM developer about optimizing code, and he tells you that IBM is
>> well ahead (by "at least" five years) of the Open Source community!
>
> If you look at coding/automation on decentralized platforms, the Mainframe is
> technically still ahead of the other platforms. It may be slower on
> implementations but that's not to do with the Mainframe but of people coding
> like 20 years ago and structures of the companies.

That is so (expletive deleted) true. Take my last employer in Belgium, asemi-governmental health insurance. The show was run by an old fashionedStalinist Secretary General, the CIO was parachuted in because he was a faithfulmember of the Socialist Party (and had been an advisor to a minister), but was aclueless idiot when it came to IT.

They (ab)used PL/I like there was no tomorrow. Running Enterprise PL/I V3.9 atthe time (2009), 99% of their employees still used it as it it was OS PL/IV2.3.0, without realising that there were about a zillion new builtin functions,features like "UNION", much better diagnostic messages, you name it. Worst ofall they were

1) compiling ARCH(2) on their ARCH(5) machine (and it took en email of IBM'sPeter Elderon "Tell them they've got a car with six gears and only use two") tochange that, and2) an order, no several orders, worse, they were running with the LE STORAGEoption set to "STORAGE(0,0,0,whatever)", "because the outfit that had helpedthem during the transition from OS PL/I to Enterprise PL/I had told them thatthis would take care on initializing variables." Yes, like PACKED DECIMAL with'0' sign nibbles...

They also didn't mind coding loops using zoned decimal fields, and I even founda recent program that was using GOTO to code loops, go figure...


>> .. Here is the output of the compiler, and lets start with the code
>> generated by the old OS compiler, and for what it's worth, w_hh and w_mm
>> are defined externally as "fixed dec (3)"
>
> for this tests, please post the complete/cropped Program. It's easier to copy
> and paste and then run the compiler.

I'll send you a copy, with the EPLI V4.3 options I use and the input file, orput those on my Google Drive, just in case others are also interested, PM me ifyou are.


> old:
>> ..01DA3A  D2 07 D 0B8 D 09B  MVC   S(8),WKSP.78+35
>>
>> L   :  3 ZAP : 18 MP  : 10 MVC : 23 DP  :  4 MVN :  8 NI  :  4 XC  :  4 XI
>> :  4 SP  :  4 --- 82 instructions, 470 bytes of code
>
> new:
>>
>> First thing that's noticable is that the code has become totally unreadable
>> to those with only a little knowledge of z/OS assembly language, everything
>> is #pdnnnnn and variable names are mostly gone and, hey, it looks like it's
>> a bit longer, so lets count:
>>
>> ZAP : 15 MP  : 10 MVC : 52 DP  :  4 MVN :  8 AHI :  1 XC  : 22 TM  :  2 SP
>> :  4 NC  :  4 SRP :  4 JE  :  2 --- 128 instructions, 758 bytes of code
>
> You alway's need good Assembler knowhow to understand at this level what the
> compiler does. It's easier to read if you disable all checks and
> optimizations.

The get a completely different load module. That's one of the other things I'vealways been dead against, testing a program with OPT(0) and then recompiling itinto production, without retesting, using OPT(3).


> So for performance, please don't count the instructions but the bytes moved
> by the instructions and the place (Register, Storage, Cache) they are in. The
> Instructions are in the instruction cache and if you don't destroy the
> instruction cache, it's extremly fast.

Of course cached code is fast, and the cache on the z13 is huge, but when such asimple routine increases by around 50% in size, what happens with more complexroutines. In the end cache will have to kick out code...


> As i have made some tests some time ago, here is my list (not complete) of
> fastest instruction types, fastest at top, example in brackets: -Immediate
> instructions (LHI) -Register to Register (LR) -Memory (read) to Register (L)
> -Memory(write) from Register (ST) -Memory to Memory (MVC)
>
> i don't know why, but at my tests, Immediate Instructions was faster than
> Register to Register... it may be wrong.. For Memory operations, it depend on
> the Datacache (Level x) it's in.
>
>> And the count, or rather the act of counting, immediately raises five very,
>> very significant questions:
>>
>> 1) What is the performance of the old code on a new z13 system compared to
>> the code that is now emitted?
>
> Don't know but needs to much time to test. :)

And as others have already said, time is the one resource we no longer have.

>> 2) There are still 10 MP instructions, and inspection of the code, which
>> was compiled ARCH(10) OPT(3) (in other words, as optimal as possible on the
>> hardware I have access to), reveals that Enterprise PL/I AD 2014 still
>> doesn't seem to know anything about common sub-expression elimination.
>
> ok
>
>> 2) Why the abso-eff-ing hell (sorry is these words cause some offence) are
>> there two JE instructions in this code, as any conditional jump has the
>> ability of causing significant stalls due to breaking the pipeline? We are
>> TRUNCATING, not rounding!
>
> don't know
>
>> 4) Why the flucking 'ell does the added multiply of "n * 3600" take only
>> three instructions using the OS compiler, but no less than 7 (seven, SEVEN,
>>  S*E*V*E*N!) using Enterprise PL/I. And the "n * 60" multiply is even
>> worse, two versus eight!
>
> it's only five if you set some compiler options (disabling check's like
> RTCHECK): L        r1,_addrN(,r13,204) ZAP
> #pdr11@546_2(6,r13,546),_shadow1(3,r1,0) MP
> #pdr11@546_2(6,r13,546),+CONSTANT_AREA(3,r6,35) ZAP
> #pdr9@358_2(5,r13,358),#pdr11@546_2(6,r13,546) MVC
> D(5,r13,184),#pdr9@358_2(r13,358)

Which RTCHECK option are you referring to? The only one I use is NONULLPTR andthat has no relevance to my code, or has it???


>> 5) Any compiler worth the adjective "Optimizing" knows about the underlying
>>  hardware, and this is what the POP tells me about the DP instruction:
>
> The Optimizer is not always better. It must be like a chess-computer.
> Sometimes, OPT(2) gives better code than OPT(3). If you want to look at the
> compiler itself, try OPT(0).

Thanks, but no thanks, the code you get from that level of non-optimization isprobably of the same quality as Turbo Pascal 1.0 (AD 1983)...


> I think, the code is better now. It's bigger and maybe slower but have some
> goodies that the old compiler did not had. Also today's machines are "out of
> order" machines so even the hardware does optimizations.

Of course, but the fastest instructions are still those instructions that arenever executed. ;)


>> .. IBM why you are paying a large amount of money for a compiler that
>> generates code that may well significantly negate the increased speed your
>> latest shiny z13 system is supposed to deliver.
>
> I think, the PL/I compiler is still the best of all. I write everything in
> PL/I and for some exceptions Assembler. exceptions: -High performant code
> (called Millions to Millards of times per day). -System interfaces that have
> some special API's or using Mapping Macros.
>
> PL/I has implemented the new instructions of z13 (sadly, we can't use because
> our fallbackmachine is a zEC12). We have now with the z13 alot of Vector
> Registers, i would use it as cache of the Registers as we alway's have to
> less Registers (beware of flot and vector Instructions and call's).

Yes, extra registers are great, but be honest, how much parallelism is there inrun-of-the mill "do while not end-of-file" code written by banks and insurancecompanies?


> But ! The whole is not that important as you may think (but it's
> interesting). -Costs of the MIPS are significant lower than even some years
> ago. -To make a difference in costs, this code have to run millions of times
> per day, every day. -Whatever you code in PL/I, today you (mostly) don't need
> to look at performance, even very bad code is today as fast as hell. -If you
> do DB2, everything in PL/I looks extermly fast. -If you do Webservice,
> everything with DB2 looks extremly fast.

Absolutely, but once you're going to use those arguments you're ending up on theslippery slope of "Who cares if our PL/I is a bit slower, all the other stuff isorders of magnitude slower anyway..."


> Today our problem's are (importantest at top): -Maintenance. You have to code
> so, that anyone can continue on the program. We have rules and the PL/I
> compiler check's them (included by alot of RFE's, for example
> 
<https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=88020>).

I know you as a prolific RFE poster, but the above? It's already useful(ish)that completely unreferenced variables are flagged, but splitting this up intoanother eight categories seems overkill, and having unreferenced variables haslittle, or no impact on the generated code.


> -Multiple Megabytes transferred by applications (IMS / CICS).
> -sometimes, array's are full initialized (Megabytes) but our rules are, that
> you don't do initializations of arrays but set a fill variable.

In 1985 during my initial training I was taught that

array_of_structures = '';

was bad, much better was

array_of_structures(1) = '';
array_of_structures    = array_of_structures(1);

which was very efficient using the OS compiler (overlapping MVC's weregenerated). Even better was it to have a static variable with the same structureof the above, and use that to initialize the array. And obviously it's best notto use any initializations, but to keep track of the actual usage, but this maynot always possible if indices are not contiguous. Sadly, according to PeterElderon overlapping MVC's are no longer optimal on the z13 and, hard to believe,loops are faster, and those are also used on the Windoze compiler, where REPMOVSD is obviously the way to go, especially on later x86/AMD64 hardware - thefact that the current z/OS compiler is a direct descendant of the OS/2 VisualAge compiler still makes me believe that a lot of the very inefficient codegenerated by that compiler (due to the severe shortage of usable 32-bitregisters on Intel/AMD) was ported willy-nilly to the z/OS one, where using thehigh-register facility, you have in essence 32 32-bit registers, which is ofcourse also of limited use as banks and insurance companies tend to do a lot oftheir calculations on packed decimal...


> Also i think, PL/I has the best support. Almost every (reasonable) RFE is
> implemented in the next PL/I release, sometimes even back to older PL/I
> releases. You first have to search for anything similar...

Was it the first compiler to offer 64-bit support? Or does that honour belong toC/C++?

I've never looked at any RFE's for C and COBOL, but given all the latest COBOLimprovements, that may no longer be completely true, and it also should not betoo surprising, given that COBOL is used rather a lot more. :(

Anyway, you know what [very big disclaimer, the only "compiler" I ever wrotetranslated JCL to REXX] *might* help? Getting rid of the currentone-size-fits-all compiler back-end strategy, where the same back-end is nowseems to be used for both C and PL/I (and probably also COBOL). It may wellresult in shoehorning the intermediate compiler output into a format that *may*well be less than optimal for some constructions in PL/I and COBOL, as comparedto C...


Robert
--
Robert AH Prins
robert(a)prino(d)org
Some programming @ http://prino.neocities.org/

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: A Christmassy PL/I tale

Reply via email to