-O3 often chooses longer code and unrollsmore agressively inducing higher miss rates in the instruction caches.-O2 can beat -O3 in some cases when code size is important.
That is generally true. My point is that GCC and Clang make different tradeoffs when told '-O2'; Clang is more aggressive than GCC at -O2. I don't know if that still holds at -O3 (I expect probably not).