On 6/17/25 3:24 PM, Andrew Waterman wrote:
I designed Rocket, so I can confirm Yangyu's comment that the branch
misprediction penalty is usually 3 cycles. (It's actually 4 if the
address of the correct-path instruction is a 4-byte-long instruction
that isn't naturally aligned, but that should only happen around a
quarter of the time.) Since it's a single-issue core, that means a
mispredicted branch has a cost of around 4 instructions, and a
correctly predicted one of course has a cost of around 1 instruction.
A branch cost of 4 is therefore an overestimate, since even programs
dominated by unpredictable branches will exhibit correct predictions
some fraction of the time.
Understood. Based on all that a cost of 3 or 4 could be appropriate
since there's secondary effects of eliminating a branch (better
instruction combination, better register allocation, scheduling, etc).
But, I don't see a compelling need/reason to adjust the cost from 3 to 4
though for the rocket uarch.
I would think the best answer here is to use the right -mtune setting
for a given core (and of course make sure that core's parameters are
accurate), but I'll butt out of that decision.
You're absolutely correct. The right answer here is precisely that each
core should have a reasonable default cost and folks should use the
right -mtune option.
Jeff