Our testing on a Z14 (MVS under VM), MVCL was considerably slower than a 256-byte MVC loop plus an executed MVC for various unaligned data lengths from 40 bytes to 32K.
For zeroing memory up to 1G, XC in a loop was about the same as MVCL up to 256 bytes, then MVCL was faster (MVCLE was slightly slower even when the MVCL had to be looped)). MVCL was also faster than MVPG, DSPSERV RELEASE, PGSER in general, except when page aligned for MVPG. On 2020-10-20 12:39 p.m., Mike Hochee wrote:
Really interesting thread to start the day with! Our experience has been that the MVC loops are typically faster, up to a point, that being about 30-40 instructions in the pipeline and as mentioned, and this seemed very processor dependent. However when source and target operands happen to both be aligned on a page boundary, then the opportunity exists for the async data mover to kick in if a move long is being used. I think this applied to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize both MVCs and MVCL/E. More grist for the mill!
Gary Weinhold Senior Application Architect DATAKINETICS | Data Performance & Optimization Phone:+1.613.523.5500 x216 Email: [email protected] Visit us online at www.DKL.com E-mail Notification: The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system. -----Original Message-----
From: IBM Mainframe Assembler List [mailto:[email protected]] On Behalf Of [email protected] Sent: Tuesday, October 20, 2020 12:12 PM To: [email protected] Subject: Re: Conditional MVCL macro? Caution! This message was sent from outside your organization. The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates LAY R10,5072(,R9) FROM LA R7,1072(,R9) TO MVC 0(256,R7),0(R10) MVC 256(256,R7),256(R10) MVC 512(256,R7),512(R10) MVC 768(256,R7),768(R10) MVC 1024(256,R7),1024(R10) MVC 1280(256,R7),1280(R10) MVC 1536(256,R7),1536(R10) MVC 1792(256,R7),1792(R10) MVC 2048(256,R7),2048(R10) MVC 2304(256,R7),2304(R10) MVC 2560(256,R7),2560(R10) MVC 2816(256,R7),2816(R10) MVC 3072(256,R7),3072(R10) MVC 3328(256,R7),3328(R10) MVC 3584(256,R7),3584(R10) MVC 3840(160,R7),3840(R10) However for 5000 bytes it generates: LAY R7,6072(,R9) LA R10,0(,R7) LA R7,1072(,R9) LHI R11,0x13 EQU * MVC 0(256,R7),0(R10) LA R10,256(,R10) LA R7,256(,R7) BRCT R11,L0128 MVC 0(136,R7),0(R10) And yes the change occurred at 4097 bytes. -----Original Message----- From: IBM Mainframe Assembler List <[email protected]> On Behalf Of Charles Mills Sent: Tuesday, October 20, 2020 10:54 To: [email protected] Subject: Re: Conditional MVCL macro? @Ed, can you elaborate a little on your reasoning? (Not doubting it; just curious.) Is it that the interruptibility provides a significant improvement over MVCL? Or the support for lengths greater than 16M? Or ... ? When I asked Dr. Shum about move strategies he seemed to indicate that for data that was already or would soon anyway be in cache an MVC loop was generally faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did not suggest it.) Charles -----Original Message----- From: IBM Mainframe Assembler List [mailto:[email protected]] On Behalf Of Ed Jaffe Sent: Tuesday, October 20, 2020 6:52 AM To: [email protected] Subject: Re: Conditional MVCL macro? We've switched almost exclusively to MVCLE except for short, fixed-length moves.
