I don't know how today's machines (z13 and up) perform, but back when I had 
access to Strobe it regularly pointed out long MVCL / CLCL instructions 
generated by COBOL 4.2 (in the specific application case I was working on these 
were usually around 8K bytes) as relatively large "hot spots" of CPU usage.  
Mitigating how often those moves and compares were actually needed (as opposed 
to blind usage) saved us something on the order of 3-5% average CPU time.

Our current performance analyzer is useless, so I can't tell you what happens 
now that we are on reasonably current generation z and using COBOL 6.2.

I like Dave's suggestion, it seems a reasonable compromise when you have the 
option (or need) of coding in assembler.

Peter

-----Original Message-----
From: IBM Mainframe Assembler List <[email protected]> On Behalf 
Of Thomas David Rivers
Sent: Tuesday, October 20, 2020 9:06 AM
To: [email protected]
Subject: Re: Conditional MVCL macro?

> 
> What is the effect of the conditional branch and the EX on the pipeline? Are 
> the performance tradeoffs the same on all supported processors? Also, tuning 
> code for a current processor may slow it down on a new one.
> 
> 
> --
> Shmuel (Seymour J.) Metz
> https://urldefense.com/v3/__http://mason.gmu.edu/*smetz3__;fg!!Ebr-cpP
> eAnfNniQ8HSAI-g_K5b7VKg!bX31ApFbaISNX6nSDgPjHkDZ-rYYj9xqye_K7xbGA8eNl8
> dq0VYfrx7W5BL6q4-EazeBzQ$

 In *very* casual tests we and some customers did, we determined  that this 
general scenerio seems to be a good approach for  moving bytes with a constant 
length:

     sizes less than 1024:
       generate up to 4 MVCs in a row
   
     sizes greater than or equal to 1024:
       if MVCLE is allowed (there is a compiler option for this)
       then use MVCLE
     
       otherwise:
         generate a loop of MVCs updating the src/target
         address and lengths as needed (you don't need an EX
         for this.)   Basically divide the length by 256
         and loop moving 256 bytes at a time by that count;
         then get the modulus of the length by 256 and
         move those remaining bytes (since the length is constant,
         the division and mod operations provide constants.)

 That seems to be a good balance between code-size and speed. 
 And, the loop is small enough that it probably fits in the  machines 
instruction-cache, so hopefully the branch back  (a BCTR back to the MVC) isn't 
that painful.

 Just some thoughts...

        - Dave R. -

--

This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, please notify us immediately by e-mail and delete the message and any 
attachments from your system.

Reply via email to