------- Comment #13 from jakub at gcc dot gnu dot org  2009-04-29 09:32 -------
You are benchmarking something completely unrelated.
What really matters is how code that has 4 branches/calls in one 16-byte block
is able to predict all those branches.  And Core2 similarly to various AMD CPUs
is not able to predict them well.

In the #c6 testcase it considers the je, call, jne and ret whether they can be
in a 16 byte block or not.  They can't, je is 2 bytes, call 5 bytes, leal 4
bytes (but gcc uses min_insn_size, which is 2 in this case), testl 2, jne 2,
addq 4 (but again, min_insn_size is 2 in this case).
min_insn_size seems to be very conservative, I guess teaching it about a bunch
of prefixes couldn't hurt, for non-jump/call insns ATM it estimates just the
displacement size, doesn't consider any prefixes (even those that really can't
change after machine reorg), etc.


-- 

jakub at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hubicka at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39942

Reply via email to