The pipeline is optimized for running many instructions in a row.  A
branch is not recognized until through a good part of the pipeline.
Meanwhile the data to be skipped is in the instruction pipeline.

Results meet expectations.

On Fri, Apr 29, 2016 at 7:40 AM, David Crayford <[email protected]> wrote:
> We're doing some performance work on our assembler code and one of my
> colleagues ran the following test which was surprising. Unconditional
> branching can add significant overhead. I always believed that conditional
> branches were expensive because the branch predictor needed to do more work
> and unconditional branches were easy to predict. Does anybody have an
> explanation for this. Our machine is z114. It appears that it's even worse
> on a z13.
>
> Here's the code.
>
> I wrote a simple program - it tight loops 1 billion times
>
>
>          L     R4,=A(1*1000*1000*1000)
>          LTR   R4,R4
>          J     LOOP
> *
> LOOP     DS   0D                  .LOOP START
>          B     NEXT
>
> NEXT     JCT   R4,LOOP
>
> The loop starts with a branch ... I tested it twice - when the CC is matched
> (branch happens) and when it is not matched (falls through)
>
> 1. When the CC is matched and branching happens, CPU TIME=2.94 seconds
> 2. When the CC is not matched the code falls through, CPU TIME=1.69 seconds
> - a reduction of 42%
>
> ----------------------------------------------------------------------
> For IBM-MAIN subscribe / signoff / archive access instructions,
> send email to [email protected] with the message: INFO IBM-MAIN



-- 
Mike A Schwab, Springfield IL USA
Where do Forest Rangers go to get away from it all?

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Reply via email to