Maybe the IBM Assembler List might be helpful here?
If you have not joined, use this URL:
https://listserv.uga.edu/cgi-bin/wa?A0=ASSEMBLER-LIST
Lizette
> -----Original Message-----
> From: IBM Mainframe Discussion List [mailto:[email protected]] On
> Behalf Of David Crayford
> Sent: Friday, April 29, 2016 7:55 AM
> To: [email protected]
> Subject: Re: An explanation for branch performance?
>
> On 29/04/2016 10:27 PM, Mike Schwab wrote:
> > Well, the obvious solution is to code the eyecatcher literals before
> > the entry point. It will be less obvious that the eyecatcher is part
> > of the program (and not the end of the previous program) but as the
> > technique become more widespread it should become more trusted.
>
> Thanks! We already know the solution. I looking for an answer. I'm a C/C++
> coder by trade and Metal/C has a neat FPB control block for the eyecatchers
> which are pointed to by an offset just above the entry point.
>
> ENTRY @@CCN@240
> @@CCN@240 AMODE 31
> DC XL8'00C300C300D50100' Function Entry Point Marker
> DC A(@@FPB@4-*+8) Signed offset to FPB
> DC XL4'00000000' Reserved
> @@CCN@240 DS 0F
> <snip>
> @@LIT@4 LTORG
> @@FPB@ LOCTR
> @@FPB@4 DS 0F Function Property Block
> DC XL2'CCD5' Eyecatcher
> DC BL2'0000000000000011' Saved GPR Mask
> DC A(@@PFD@@-@@FPB@4) Signed Offset to Prefix Data
> DC BL1'00000000' Flag Set 1
> DC BL1'10000000' Flag Set 2
> DC BL1'00000000' Flag Set 3
> DC BL1'00000001' Flag Set 4
> DC XL4'00000000' Reserved
> DC XL4'00000000' Reserved
> DC AL2(12)
> DC C'avl_iter_cur'
>
> > On Fri, Apr 29, 2016 at 9:13 AM, David Crayford <[email protected]> wrote:
> >> On 29/04/2016 10:09 PM, Mike Schwab wrote:
> >>> The pipeline is optimized for running many instructions in a row. A
> >>> branch is not recognized until through a good part of the pipeline.
> >>> Meanwhile the data to be skipped is in the instruction pipeline.
> >>>
> >>> Results meet expectations.
> >>
> >> So branching over eyecatchers is expected to be x2 slower on a z13
> >> than a z114? I was always lead to believe that new hardware always
> >> ran old code faster unless it was doing nasty stuff like storing into
> >> the instruction stream.
> >>
> >>
> >>> On Fri, Apr 29, 2016 at 7:40 AM, David Crayford
> >>> <[email protected]>
> >>> wrote:
> >>>> We're doing some performance work on our assembler code and one of
> >>>> my colleagues ran the following test which was surprising.
> >>>> Unconditional branching can add significant overhead. I always
> >>>> believed that conditional branches were expensive because the
> >>>> branch predictor needed to do more work and unconditional branches
> >>>> were easy to predict. Does anybody have an explanation for this.
> >>>> Our machine is z114. It appears that it's even worse on a z13.
> >>>>
> >>>> Here's the code.
> >>>>
> >>>> I wrote a simple program - it tight loops 1 billion times
> >>>>
> >>>>
> >>>> L R4,=A(1*1000*1000*1000)
> >>>> LTR R4,R4
> >>>> J LOOP
> >>>> *
> >>>> LOOP DS 0D .LOOP START
> >>>> B NEXT
> >>>>
> >>>> NEXT JCT R4,LOOP
> >>>>
> >>>> The loop starts with a branch ... I tested it twice - when the CC
> >>>> is matched (branch happens) and when it is not matched (falls
> >>>> through)
> >>>>
> >>>> 1. When the CC is matched and branching happens, CPU TIME=2.94
> >>>> seconds 2. When the CC is not matched the code falls through, CPU
> >>>> TIME=1.69 seconds
> >>>> - a reduction of 42%
> >>>>
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN