> > So would need much more benchmarking on macro workloads first at least.
> Like what, for example? I believe in this case everything also
> strongly depends on test usage model (e.g. it usually compiled with Os
> not O2) and, let's say, internal test structure - whether there are
> hot loops that suitable for unroll.
Normally the compiler doesn't know if a loop is hot unless you use
profile feedback. So worst case on a big code base you may end up
with a lot of unnecessary unrolling. On cold code it's just wasted
bytes, but there could be already icache limited code where it
would be worse.
How about just a compiler bootstrap on Atom as a "worst case"?
For the benchmark can you use profile feedback?
BTW I know some loops are unrolled at -O3 by default at tree level because
the vectorizer likes it. I actually have an older patch to dial this
down for some common cases.
a...@linux.intel.com -- Speaking for myself only.