On Wed, Apr 11, 2012 at 5:34 PM, Andi Kleen <a...@firstfloor.org> wrote: > Richard Guenther <richard.guent...@gmail.com> writes: >> >> 5% is not moderate. Your patch does enable unrolling at -O2 but not -O3, >> why? Why do you disable register renaming? check_imull requires a function >> comment. >> >> This completely looks like a hack for EEMBC2.0, so it's definitely not ok. >> >> -O2 is not supposed to give best benchmark results. > > Besides it is against the Intel Optimization Manual recommendation > to prefer small code on Atom to avoid falling out of the predecode hints > in the cache.
Yes, this is well-known concern for Atom. But in the same time unroll could help a lot for inorder machines because it could provide more opportunities to a compiler scheduler. And experiments showed that unroll could really help. > > So would need much more benchmarking on macro workloads first at least. Like what, for example? I believe in this case everything also strongly depends on test usage model (e.g. it usually compiled with Os not O2) and, let's say, internal test structure - whether there are hot loops that suitable for unroll. > > -Andi > > -- > a...@linux.intel.com -- Speaking for myself only