Re: An explanation for branch performance?

David Crayford Fri, 29 Apr 2016 08:31:16 -0700

On 29/04/2016 11:10 PM, Lizette Koehler wrote:

Maybe the IBM Assembler List might be helpful here?


If you have not joined, use this URL:
                        https://listserv.uga.edu/cgi-bin/wa?A0=ASSEMBLER-LIST

Lizette

See my earlier response to Elardus. The assembler list is almostmoribund. Everybody posts here with these kind of questions because theaudience is much broader. Most of the old regulars from ASSEMBLER-LISTnow converse on linkedin groups.

-----Original Message-----
From: IBM Mainframe Discussion List [mailto:[email protected]] On
Behalf Of David Crayford
Sent: Friday, April 29, 2016 7:55 AM
To: [email protected]
Subject: Re: An explanation for branch performance?

On 29/04/2016 10:27 PM, Mike Schwab wrote:

Well, the obvious solution is to code the eyecatcher literals before
the entry point.  It will be less obvious that the eyecatcher is part
of the program (and not the end of the previous program) but as the
technique become more widespread it should become more trusted.

Thanks! We already know the solution. I looking for an answer. I'm a C/C++
coder by trade and Metal/C has a neat FPB control block for the eyecatchers
which are pointed to by an offset just above the entry point.

           ENTRY @@CCN@240
@@CCN@240 AMODE 31
           DC    XL8'00C300C300D50100'   Function Entry Point Marker
           DC    A(@@FPB@4-*+8)          Signed offset to FPB
           DC    XL4'00000000'           Reserved
@@CCN@240 DS   0F
<snip>
@@LIT@4  LTORG
@@FPB@   LOCTR
@@FPB@4  DS    0F                      Function Property Block
           DC    XL2'CCD5'               Eyecatcher
           DC    BL2'0000000000000011'   Saved GPR Mask
           DC    A(@@PFD@@-@@FPB@4)      Signed Offset to Prefix Data
           DC    BL1'00000000'           Flag Set 1
           DC    BL1'10000000'           Flag Set 2
           DC    BL1'00000000'           Flag Set 3
           DC    BL1'00000001'           Flag Set 4
           DC    XL4'00000000'           Reserved
           DC    XL4'00000000'           Reserved
           DC    AL2(12)
           DC    C'avl_iter_cur'

On Fri, Apr 29, 2016 at 9:13 AM, David Crayford <[email protected]> wrote:

On 29/04/2016 10:09 PM, Mike Schwab wrote:

The pipeline is optimized for running many instructions in a row.  A
branch is not recognized until through a good part of the pipeline.
Meanwhile the data to be skipped is in the instruction pipeline.

Results meet expectations.

So branching over eyecatchers is expected to be x2 slower on a z13
than a z114? I was always lead to believe that new hardware always
ran old code faster unless it was doing nasty stuff like storing into
the instruction stream.

On Fri, Apr 29, 2016 at 7:40 AM, David Crayford
<[email protected]>
wrote:

We're doing some performance work on our assembler code and one of
my colleagues ran the following test which was surprising.
Unconditional branching can add significant overhead. I always
believed that conditional branches were expensive because the
branch predictor needed to do more work and unconditional branches
were easy to predict. Does anybody have an explanation for this.
Our machine is z114. It appears that it's even worse on a z13.

Here's the code.

I wrote a simple program - it tight loops 1 billion times


            L     R4,=A(1*1000*1000*1000)
            LTR   R4,R4
            J     LOOP
*
LOOP     DS   0D                  .LOOP START
            B     NEXT

NEXT     JCT   R4,LOOP

The loop starts with a branch ... I tested it twice - when the CC
is matched (branch happens) and when it is not matched (falls
through)

1. When the CC is matched and branching happens, CPU TIME=2.94
seconds 2. When the CC is not matched the code falls through, CPU
TIME=1.69 seconds
- a reduction of 42%

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN


----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: An explanation for branch performance?

Reply via email to