https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153

--- Comment #5 from ncm at cantrip dot org ---
My preliminary conclusion is that a hardware optimization provided in Haswell
but not in Westmere is not recognizing the opportunity in the unsigned int
test case, that it finds in the original bitset version, as compiled by gcc-5.

I have also observed that adding an assertion that the array index is not
negative, before the first array access, slows the program a further 100%, 
on Westmere.

Note that the entire data set fits in L3 cache on all tested targets, so
memory bandwidth does not figure.

To my inexperienced eye the effects look like branch mispredictions.
I do not understand why a 3.4 GHz DDR3 Haswell runs as slowly as a 
2.4 GHz DDR2 Westmere, when branch prediction (or whatever it is) 
fails.

Reply via email to