https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67153
--- Comment #29 from ncm at cantrip dot org --- > My reason for thinking this is not a bug is that the fastest choice will > depend on the contents of the word list. Regardless of layout, there will > be one option that is slightly faster than the other. I guess it's > reasonable to ask, though, whether it's better by default to try to save one > cycle on an already very fast empty loop, or to save one cycle on a more > expensive loop. But the real gain (if there is one) will be matching the > layout to the runtime behavior, for which the compiler requires outside > information. Saving one cycle on a two-cycle loop has a possibility of a much larger effect than saving one cycle of a fifty-cycle loop. Even if the fifty-cycle loop is the norm, an extra cycle costs only 2%, but if the two-cycle loop is the more common, as in this case, saving the one cycle is a big win.