This is really interesting. For those put off by the "C++" note that the issue 
has nothing whatsoever to do with C++. It is a pure branch prediction issue. 
Picture a program that computes an array of pseudo-random 8-bit integers from 0 
to 255. Then it solves the problem "what is the sum of all of the numbers in 
the table greater than or equal to 128?" It does that with the obvious loop, 
which then contains

IC   R0,next_table_byte
CHI  R0,128
JL   *+6
AR   R2,R0

The assertion of the thread -- and I don't doubt it -- is that the above code 
uses 1/6 the CPU time if you first sort the table. (Obviously, a stupid way of 
doing things: you could sort the table high to low and then exit once you got a 
value below 128; but that's not the point.)

It illustrates the value of, or problem with, depending on your point of view, 
branch prediction. If the table is random then the outcome of the CHI/JL is 
unpredictable, and the branch prediction is at best useless and at worst 
counter-productive. But if the table is sorted, then the branch prediction has 
a chance to be right most of the time.

There's a lot more than that in the thread, and it fundamentally has to do with 
modern CPU technology, not C++. 


-----Original Message-----
From: IBM Mainframe Discussion List [mailto:IBM-MAIN@LISTSERV.UA.EDU] On Behalf 
Of David Crayford
Sent: Tuesday, August 13, 2019 11:13 PM
Subject: Re: Instruction speeds

Some interesting information on branch prediction in that paper.

If you've got a z/OS C++ compiler try this snippet out, it's fascinating:

For IBM-MAIN subscribe / signoff / archive access instructions,
send email to with the message: INFO IBM-MAIN

Reply via email to