I don't know how today's machines (z13 and up) perform, but back when I had access to Strobe it regularly pointed out long MVCL / CLCL instructions generated by COBOL 4.2 (in the specific application case I was working on these were usually around 8K bytes) as relatively large "hot spots" of CPU usage. Mitigating how often those moves and compares were actually needed (as opposed to blind usage) saved us something on the order of 3-5% average CPU time.
Our current performance analyzer is useless, so I can't tell you what happens now that we are on reasonably current generation z and using COBOL 6.2. I like Dave's suggestion, it seems a reasonable compromise when you have the option (or need) of coding in assembler. Peter -----Original Message----- From: IBM Mainframe Assembler List <[email protected]> On Behalf Of Thomas David Rivers Sent: Tuesday, October 20, 2020 9:06 AM To: [email protected] Subject: Re: Conditional MVCL macro? > > What is the effect of the conditional branch and the EX on the pipeline? Are > the performance tradeoffs the same on all supported processors? Also, tuning > code for a current processor may slow it down on a new one. > > > -- > Shmuel (Seymour J.) Metz > https://urldefense.com/v3/__http://mason.gmu.edu/*smetz3__;fg!!Ebr-cpP > eAnfNniQ8HSAI-g_K5b7VKg!bX31ApFbaISNX6nSDgPjHkDZ-rYYj9xqye_K7xbGA8eNl8 > dq0VYfrx7W5BL6q4-EazeBzQ$ In *very* casual tests we and some customers did, we determined that this general scenerio seems to be a good approach for moving bytes with a constant length: sizes less than 1024: generate up to 4 MVCs in a row sizes greater than or equal to 1024: if MVCLE is allowed (there is a compiler option for this) then use MVCLE otherwise: generate a loop of MVCs updating the src/target address and lengths as needed (you don't need an EX for this.) Basically divide the length by 256 and loop moving 256 bytes at a time by that count; then get the modulus of the length by 256 and move those remaining bytes (since the length is constant, the division and mod operations provide constants.) That seems to be a good balance between code-size and speed. And, the loop is small enough that it probably fits in the machines instruction-cache, so hopefully the branch back (a BCTR back to the MVC) isn't that painful. Just some thoughts... - Dave R. - -- This message and any attachments are intended only for the use of the addressee and may contain information that is privileged and confidential. If the reader of the message is not the intended recipient or an authorized representative of the intended recipient, you are hereby notified that any dissemination of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by e-mail and delete the message and any attachments from your system.
