The pipeline is optimized for running many instructions in a row. A branch is not recognized until through a good part of the pipeline. Meanwhile the data to be skipped is in the instruction pipeline.
Results meet expectations. On Fri, Apr 29, 2016 at 7:40 AM, David Crayford <[email protected]> wrote: > We're doing some performance work on our assembler code and one of my > colleagues ran the following test which was surprising. Unconditional > branching can add significant overhead. I always believed that conditional > branches were expensive because the branch predictor needed to do more work > and unconditional branches were easy to predict. Does anybody have an > explanation for this. Our machine is z114. It appears that it's even worse > on a z13. > > Here's the code. > > I wrote a simple program - it tight loops 1 billion times > > > L R4,=A(1*1000*1000*1000) > LTR R4,R4 > J LOOP > * > LOOP DS 0D .LOOP START > B NEXT > > NEXT JCT R4,LOOP > > The loop starts with a branch ... I tested it twice - when the CC is matched > (branch happens) and when it is not matched (falls through) > > 1. When the CC is matched and branching happens, CPU TIME=2.94 seconds > 2. When the CC is not matched the code falls through, CPU TIME=1.69 seconds > - a reduction of 42% > > ---------------------------------------------------------------------- > For IBM-MAIN subscribe / signoff / archive access instructions, > send email to [email protected] with the message: INFO IBM-MAIN -- Mike A Schwab, Springfield IL USA Where do Forest Rangers go to get away from it all? ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
