Re: Conditional MVCL macro?

Gary Weinhold Tue, 20 Oct 2020 10:28:09 -0700

Our testing on a Z14 (MVS under VM), MVCL was considerably slower than a
256-byte MVC loop plus an executed MVC for various unaligned data
lengths from 40 bytes to 32K.


For zeroing memory up to 1G, XC in a loop was about the same as MVCL up
to 256 bytes, then MVCL was faster (MVCLE was slightly slower even when
the MVCL had to be looped)).  MVCL was also faster than MVPG, DSPSERV
RELEASE, PGSER in general, except when page aligned for MVPG.

On 2020-10-20 12:39 p.m., Mike Hochee wrote:

Really interesting thread to start the day with!

Our experience has been that the MVC loops are typically faster, up to a point, 
that being about 30-40 instructions in the pipeline and as mentioned,  and this 
seemed very processor dependent. However when source and target operands happen 
to both be aligned on a page boundary, then the opportunity exists for the 
async data mover to kick in if a move long is being used.  I think this applied 
to both MVCL and MVCLE, but not sure. So ideally a macro would want to utilize 
both MVCs and MVCL/E.

More grist for the mill!

Gary Weinhold
Senior Application Architect
DATAKINETICS | Data Performance & Optimization
Phone:+1.613.523.5500 x216
Email: [email protected]
Visit us online at www.DKL.com
E-mail Notification: The information contained in this email and any 
attachments is confidential and may be subject to copyright or other 
intellectual property protection. If you are not the intended recipient, you 
are not authorized to use or disclose this information, and we request that you 
notify us by reply mail or telephone and delete the original message from your 
mail system.


-----Original Message-----

From: IBM Mainframe Assembler List [mailto:[email protected]] On 
Behalf Of [email protected]
Sent: Tuesday, October 20, 2020 12:12 PM
To: [email protected]
Subject: Re: Conditional MVCL macro?

Caution! This message was sent from outside your organization.

The COBOL compiler for a 4000 byte move, from to the same with OPT(2) generates

LAY     R10,5072(,R9)       FROM
LA      R7,1072(,R9)          TO
MVC     0(256,R7),0(R10)
MVC     256(256,R7),256(R10)
MVC     512(256,R7),512(R10)
MVC     768(256,R7),768(R10)
MVC     1024(256,R7),1024(R10)
MVC     1280(256,R7),1280(R10)
MVC     1536(256,R7),1536(R10)
MVC     1792(256,R7),1792(R10)
MVC     2048(256,R7),2048(R10)
MVC     2304(256,R7),2304(R10)
MVC     2560(256,R7),2560(R10)
MVC     2816(256,R7),2816(R10)
MVC     3072(256,R7),3072(R10)
MVC     3328(256,R7),3328(R10)
MVC     3584(256,R7),3584(R10)
MVC     3840(160,R7),3840(R10)

However for 5000 bytes it generates:

LAY     R7,6072(,R9)
LA      R10,0(,R7)
LA      R7,1072(,R9)
LHI     R11,0x13
EQU     *
MVC     0(256,R7),0(R10)
LA      R10,256(,R10)
LA      R7,256(,R7)
BRCT    R11,L0128
MVC     0(136,R7),0(R10)

And yes the change occurred at 4097  bytes.

-----Original Message-----
From: IBM Mainframe Assembler List <[email protected]> On Behalf 
Of Charles Mills
Sent: Tuesday, October 20, 2020 10:54
To: [email protected]
Subject: Re: Conditional MVCL macro?

@Ed, can you elaborate a little on your reasoning? (Not doubting it; just
curious.) Is it that the interruptibility provides a significant improvement 
over MVCL? Or the support for lengths greater than 16M? Or ... ?

When I asked Dr. Shum about move strategies he seemed to indicate that for data 
that was already or would soon anyway be in cache an MVC loop was generally 
faster than MVCL. (I did not ask about MVCLE at the time; not sure why. He did 
not suggest it.)

Charles

-----Original Message-----
From: IBM Mainframe Assembler List [mailto:[email protected]]
On Behalf Of Ed Jaffe
Sent: Tuesday, October 20, 2020 6:52 AM
To: [email protected]
Subject: Re: Conditional MVCL macro?

We've switched almost exclusively to MVCLE except for short, fixed-length moves.

Re: Conditional MVCL macro?

Reply via email to