Maybe the IBM Assembler List might be helpful here?

If you have not joined, use this URL:
                        https://listserv.uga.edu/cgi-bin/wa?A0=ASSEMBLER-LIST

Lizette

> -----Original Message-----
> From: IBM Mainframe Discussion List [mailto:[email protected]] On
> Behalf Of David Crayford
> Sent: Friday, April 29, 2016 7:55 AM
> To: [email protected]
> Subject: Re: An explanation for branch performance?
> 
> On 29/04/2016 10:27 PM, Mike Schwab wrote:
> > Well, the obvious solution is to code the eyecatcher literals before
> > the entry point.  It will be less obvious that the eyecatcher is part
> > of the program (and not the end of the previous program) but as the
> > technique become more widespread it should become more trusted.
> 
> Thanks! We already know the solution. I looking for an answer. I'm a C/C++
> coder by trade and Metal/C has a neat FPB control block for the eyecatchers
> which are pointed to by an offset just above the entry point.
> 
>           ENTRY @@CCN@240
> @@CCN@240 AMODE 31
>           DC    XL8'00C300C300D50100'   Function Entry Point Marker
>           DC    A(@@FPB@4-*+8)          Signed offset to FPB
>           DC    XL4'00000000'           Reserved
> @@CCN@240 DS   0F
> <snip>
> @@LIT@4  LTORG
> @@FPB@   LOCTR
> @@FPB@4  DS    0F                      Function Property Block
>           DC    XL2'CCD5'               Eyecatcher
>           DC    BL2'0000000000000011'   Saved GPR Mask
>           DC    A(@@PFD@@-@@FPB@4)      Signed Offset to Prefix Data
>           DC    BL1'00000000'           Flag Set 1
>           DC    BL1'10000000'           Flag Set 2
>           DC    BL1'00000000'           Flag Set 3
>           DC    BL1'00000001'           Flag Set 4
>           DC    XL4'00000000'           Reserved
>           DC    XL4'00000000'           Reserved
>           DC    AL2(12)
>           DC    C'avl_iter_cur'
> 
> > On Fri, Apr 29, 2016 at 9:13 AM, David Crayford <[email protected]> wrote:
> >> On 29/04/2016 10:09 PM, Mike Schwab wrote:
> >>> The pipeline is optimized for running many instructions in a row.  A
> >>> branch is not recognized until through a good part of the pipeline.
> >>> Meanwhile the data to be skipped is in the instruction pipeline.
> >>>
> >>> Results meet expectations.
> >>
> >> So branching over eyecatchers is expected to be x2 slower on a z13
> >> than a z114? I was always lead to believe that new hardware always
> >> ran old code faster unless it was doing nasty stuff like storing into
> >> the instruction stream.
> >>
> >>
> >>> On Fri, Apr 29, 2016 at 7:40 AM, David Crayford
> >>> <[email protected]>
> >>> wrote:
> >>>> We're doing some performance work on our assembler code and one of
> >>>> my colleagues ran the following test which was surprising.
> >>>> Unconditional branching can add significant overhead. I always
> >>>> believed that conditional branches were expensive because the
> >>>> branch predictor needed to do more work and unconditional branches
> >>>> were easy to predict. Does anybody have an explanation for this.
> >>>> Our machine is z114. It appears that it's even worse on a z13.
> >>>>
> >>>> Here's the code.
> >>>>
> >>>> I wrote a simple program - it tight loops 1 billion times
> >>>>
> >>>>
> >>>>            L     R4,=A(1*1000*1000*1000)
> >>>>            LTR   R4,R4
> >>>>            J     LOOP
> >>>> *
> >>>> LOOP     DS   0D                  .LOOP START
> >>>>            B     NEXT
> >>>>
> >>>> NEXT     JCT   R4,LOOP
> >>>>
> >>>> The loop starts with a branch ... I tested it twice - when the CC
> >>>> is matched (branch happens) and when it is not matched (falls
> >>>> through)
> >>>>
> >>>> 1. When the CC is matched and branching happens, CPU TIME=2.94
> >>>> seconds 2. When the CC is not matched the code falls through, CPU
> >>>> TIME=1.69 seconds
> >>>> - a reduction of 42%
> >>>>

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to