[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-11-16 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



--- Comment #13 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-16 
11:40:42 UTC ---

Author: jakub

Date: Fri Nov 16 11:40:39 2012

New Revision: 193554



URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=193554

Log:

PR target/54073

* config/i386/i386.md (movmodecc): Use comparison_operator

instead of ordered_comparison_operator resp.

ix86_fp_comparison_operator predicates.

* config/i386/i386.c (ix86_expand_fp_movcc): Reject TImode

or for -m32 DImode comparisons.



Modified:

trunk/gcc/ChangeLog

trunk/gcc/config/i386/i386.c

trunk/gcc/config/i386/i386.md


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-11-13 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



--- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-13 
13:04:28 UTC ---

Created attachment 28674

  -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28674

gcc48-pr54073.patch



On x86_64-linux on SandyBridge CPU with -O3 -march=corei7-avx I've tracked it

down to the 

http://gcc.gnu.org/viewcvs?root=gccview=revrev=171341

change, in particular emit_conditional_move part of the changes.

Before the change emit_conditional_move would completely ignore the predicate

on the comparison operand (operands[1]), starting with r171341 it honors it.

And the movsicc's ordered_comparison_operator would give up on the UNLT

comparison in the MonteCarlo testcase, while ix86_expand_int_movcc expands it

just fine and at least on that loop it is beneficial to use

vucomisd%xmm0, %xmm1

cmovae  %eax, %ebp

instead of:

.L4:

addl$1, %ebx

...

vucomisd%xmm0, %xmm2

jb  .L4



The attached proof of concept patch attempts to just restore the 4.6 and

earlier behavior by allowing in all comparisons.  Of course perhaps it might be

possible it needs better tuning than that, I meant it just as a start for

discussions.



vanilla trunk:



**  **

** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **

** for details. (Results can be submitted to p...@nist.gov) **

**  **

Using   2.00 seconds min time per kenel.

Composite Score: 1886.79

FFT Mflops:  1726.97(N=1024)

SOR Mflops:  1239.20(100 x 100)

MonteCarlo: Mflops:   374.13

Sparse matmult  Mflops:  1956.30(N=1000, nz=5000)

LU  Mflops:  4137.37(M=100, N=100)



patched trunk:



**  **

** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark **

** for details. (Results can be submitted to p...@nist.gov) **

**  **

Using   2.00 seconds min time per kenel.

Composite Score: 1910.08

FFT Mflops:  1726.97(N=1024)

SOR Mflops:  1239.20(100 x 100)

MonteCarlo: Mflops:   528.94

Sparse matmult  Mflops:  1949.03(N=1000, nz=5000)

LU  Mflops:  4106.27(M=100, N=100)


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-11-13 Thread t.artem at mailcity dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



--- Comment #9 from Artem S. Tashkinov t.artem at mailcity dot com 2012-11-13 
15:06:25 UTC ---

(In reply to comment #8)

 The attached proof of concept patch attempts to just restore the 4.6 and

 earlier behavior by allowing in all comparisons.  Of course perhaps it might 
 be

 possible it needs better tuning than that, I meant it just as a start for

 discussions.



The results look terrific, I hope this patch will be merged ASAP.


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-11-13 Thread ubizjak at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



--- Comment #10 from Uros Bizjak ubizjak at gmail dot com 2012-11-13 15:13:56 
UTC ---

(In reply to comment #8)



 The attached proof of concept patch attempts to just restore the 4.6 and

 earlier behavior by allowing in all comparisons.  Of course perhaps it might 
 be

 possible it needs better tuning than that, I meant it just as a start for

 discussions.



Please see PR53346, from comment 14 onwards, especially H.J.'s comment:



-quote-

I was told that cmov wins if branch is mispredicted, otherwise

cmov loses.  We will investigate if we can improve cmov in GCC.

-/quote-


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-11-13 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



--- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-13 
15:24:19 UTC ---

(In reply to comment #10)

 Please see PR53346, from comment 14 onwards, especially H.J.'s comment:

 

 -quote-

 I was told that cmov wins if branch is mispredicted, otherwise

 cmov loses.  We will investigate if we can improve cmov in GCC.

 -/quote-



Possibly.  But then still movsicc etc. isn't automatically the right thing if

the comparison is ordered and wrong otherwise, but desirable/undesirable

depending on whether the compiler can guess if the condition can be predicated

well or not.

Guess in MonteCarlo the x*x + y*y = 1.0 condition can't be predicted well and

that is why it helps so much.


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-11-13 Thread hubicka at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



Jan Hubicka hubicka at gcc dot gnu.org changed:



   What|Removed |Added



 CC||hubicka at gcc dot gnu.org



--- Comment #12 from Jan Hubicka hubicka at gcc dot gnu.org 2012-11-13 
15:54:22 UTC ---

The decision on whether to use cmov or jmp was always tricky on x86

architectures. Cmov increase dependency chains, register pressure (both values

needs to be loaded in) and has long opcode. So jump sequence, if well

predicted, flows better through the out-of-order core. If badly predicted it

is, of course, a disaster. I think more modern CPUs solved the problems with

long latency of cmov, but the dependency chains are still there.



This patch fixes a bug in a pattern rather than tweaks heuristic on

predictability. As such I think it is OK for mainline. 



We should do something about rnflow. I will look into that.

The usual wisdom is that lacking profile feedback one should handle non-loop

branhces as inpredctable and loop branches as predictable, so all handled by

ifconvert fals to the first category. With profile feedback one can see branch

probability and if it is close to 0 or REG_BR_PROB_BASE tread the branch as

predictable. We handle this with predictable_edge_p parameter passed to

BRANCH_COST (that by itself is a gross, but for years we was not able to come

with something saner)



Honza


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-09-20 Thread jakub at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073



Jakub Jelinek jakub at gcc dot gnu.org changed:



   What|Removed |Added



   Target Milestone|4.7.2   |4.7.3



--- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org 2012-09-20 
10:18:32 UTC ---

GCC 4.7.2 has been released.


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-09-07 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Keywords||missed-optimization
   Priority|P3  |P2


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-08-16 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

   Target Milestone|--- |4.7.2


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-07-26 Thread venkataramanan.kumar at amd dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

Venkataramanan venkataramanan.kumar at amd dot com changed:

   What|Removed |Added

 CC||venkataramanan.kumar at amd
   ||dot com

--- Comment #5 from Venkataramanan venkataramanan.kumar at amd dot com 
2012-07-26 15:40:43 UTC ---
is this same as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53397


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-07-26 Thread markus at trippelsdorf dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

--- Comment #6 from Markus Trippelsdorf markus at trippelsdorf dot de 
2012-07-26 16:13:33 UTC ---
(In reply to comment #5)
 is this same as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53397

No. Monto Carlo is independent of FFT.
I can confirm the huge drop of the FFT score with -march=amdfam10.
(-flto doesn't help in this case.)


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-07-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

Richard Guenther rguenth at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2012-07-24
 CC||rguenth at gcc dot gnu.org
  Known to work||4.6.4
Summary|SciMark Monte Carlo test|[4.7/4.8 Regression]
   |performance has seriously   |SciMark Monte Carlo test
   |decreased in recent GCC |performance has seriously
   |releases|decreased in recent GCC
   ||releases
 Ever Confirmed|0   |1
  Known to fail||4.7.0, 4.8.0

--- Comment #2 from Richard Guenther rguenth at gcc dot gnu.org 2012-07-24 
09:22:45 UTC ---
Our autotesters have a jump of this magnitude between

K8:   good r171332, bad r171367
K10:  good r171399, bad r171360
IA64: good r182218, bad r182265

needs further bisection, there are a few candidates within the 171399:171360
range.  IA64 is supposedly sth else (the fix for PR21617 pops up here).

Confirmed at least.


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-07-24 Thread markus at trippelsdorf dot de
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

Markus Trippelsdorf markus at trippelsdorf dot de changed:

   What|Removed |Added

 CC||markus at trippelsdorf dot
   ||de

--- Comment #3 from Markus Trippelsdorf markus at trippelsdorf dot de 
2012-07-24 11:29:15 UTC ---
-flto helps a lot in this case (CPU=K10):

-O3:
 MonteCarlo: Mflops:   319.57
-O3 -flto:
 MonteCarlo: Mflops:   921.67


[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases

2012-07-24 Thread rguenth at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073

--- Comment #4 from Richard Guenther rguenth at gcc dot gnu.org 2012-07-24 
13:21:25 UTC ---
If they are single-file benchmarks a simple -fwhole-program would do, too.
(I wonder if we can auto-detect -fwhole-program from within the gcc driver,
if one performs non-partial linking on a single source input that should be
safe - quite a benchmark thing though).