https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153

--- Comment #9 from ncm at cantrip dot org ---
I did experiment with -m[no-]bmi[2] a fair bit.  It all made a significant
difference in the instructions emitted, but exactly zero difference in 
runtime. That's actually not surprising at all; those instructions get 
decomposed into micro-ops that exactly match those from the equivalent
instructions, and are cached, and the loops that dominate runtime execute 
out of the micro-op cache.  The only real effect is maybe slightly shorter
object code, which could matter in a program dominated by bus traffic
with loops too big to cache well.  I say "maybe slightly shorter" because
instruction-set extension instructions are actually huge, mostly prefixes.

I.e. most of the BMI stuff is marketing fluff, added mainly to make the 
competition waste money matching them instead of improving the product.

Reply via email to