On 29/04/2016 10:09 PM, Mike Schwab wrote:
The pipeline is optimized for running many instructions in a row. A
branch is not recognized until through a good part of the pipeline.
Meanwhile the data to be skipped is in the instruction pipeline.
Results meet expectations.
So branching over eyecatchers is expected to be x2 slower on a z13 than
a z114? I was always lead to believe that new hardware always ran old
code faster unless it was doing nasty stuff like storing into the
instruction stream.
On Fri, Apr 29, 2016 at 7:40 AM, David Crayford <[email protected]> wrote:
We're doing some performance work on our assembler code and one of my
colleagues ran the following test which was surprising. Unconditional
branching can add significant overhead. I always believed that conditional
branches were expensive because the branch predictor needed to do more work
and unconditional branches were easy to predict. Does anybody have an
explanation for this. Our machine is z114. It appears that it's even worse
on a z13.
Here's the code.
I wrote a simple program - it tight loops 1 billion times
L R4,=A(1*1000*1000*1000)
LTR R4,R4
J LOOP
*
LOOP DS 0D .LOOP START
B NEXT
NEXT JCT R4,LOOP
The loop starts with a branch ... I tested it twice - when the CC is matched
(branch happens) and when it is not matched (falls through)
1. When the CC is matched and branching happens, CPU TIME=2.94 seconds
2. When the CC is not matched the code falls through, CPU TIME=1.69 seconds
- a reduction of 42%
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN
----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN