[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 --- Comment #13 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-16 11:40:42 UTC --- Author: jakub Date: Fri Nov 16 11:40:39 2012 New Revision: 193554 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=193554 Log: PR target/54073 * config/i386/i386.md (movmodecc): Use comparison_operator instead of ordered_comparison_operator resp. ix86_fp_comparison_operator predicates. * config/i386/i386.c (ix86_expand_fp_movcc): Reject TImode or for -m32 DImode comparisons. Modified: trunk/gcc/ChangeLog trunk/gcc/config/i386/i386.c trunk/gcc/config/i386/i386.md
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 --- Comment #8 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-13 13:04:28 UTC --- Created attachment 28674 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=28674 gcc48-pr54073.patch On x86_64-linux on SandyBridge CPU with -O3 -march=corei7-avx I've tracked it down to the http://gcc.gnu.org/viewcvs?root=gccview=revrev=171341 change, in particular emit_conditional_move part of the changes. Before the change emit_conditional_move would completely ignore the predicate on the comparison operand (operands[1]), starting with r171341 it honors it. And the movsicc's ordered_comparison_operator would give up on the UNLT comparison in the MonteCarlo testcase, while ix86_expand_int_movcc expands it just fine and at least on that loop it is beneficial to use vucomisd%xmm0, %xmm1 cmovae %eax, %ebp instead of: .L4: addl$1, %ebx ... vucomisd%xmm0, %xmm2 jb .L4 The attached proof of concept patch attempts to just restore the 4.6 and earlier behavior by allowing in all comparisons. Of course perhaps it might be possible it needs better tuning than that, I meant it just as a start for discussions. vanilla trunk: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to p...@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1886.79 FFT Mflops: 1726.97(N=1024) SOR Mflops: 1239.20(100 x 100) MonteCarlo: Mflops: 374.13 Sparse matmult Mflops: 1956.30(N=1000, nz=5000) LU Mflops: 4137.37(M=100, N=100) patched trunk: ** ** ** SciMark2 Numeric Benchmark, see http://math.nist.gov/scimark ** ** for details. (Results can be submitted to p...@nist.gov) ** ** ** Using 2.00 seconds min time per kenel. Composite Score: 1910.08 FFT Mflops: 1726.97(N=1024) SOR Mflops: 1239.20(100 x 100) MonteCarlo: Mflops: 528.94 Sparse matmult Mflops: 1949.03(N=1000, nz=5000) LU Mflops: 4106.27(M=100, N=100)
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 --- Comment #9 from Artem S. Tashkinov t.artem at mailcity dot com 2012-11-13 15:06:25 UTC --- (In reply to comment #8) The attached proof of concept patch attempts to just restore the 4.6 and earlier behavior by allowing in all comparisons. Of course perhaps it might be possible it needs better tuning than that, I meant it just as a start for discussions. The results look terrific, I hope this patch will be merged ASAP.
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 --- Comment #10 from Uros Bizjak ubizjak at gmail dot com 2012-11-13 15:13:56 UTC --- (In reply to comment #8) The attached proof of concept patch attempts to just restore the 4.6 and earlier behavior by allowing in all comparisons. Of course perhaps it might be possible it needs better tuning than that, I meant it just as a start for discussions. Please see PR53346, from comment 14 onwards, especially H.J.'s comment: -quote- I was told that cmov wins if branch is mispredicted, otherwise cmov loses. We will investigate if we can improve cmov in GCC. -/quote-
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 --- Comment #11 from Jakub Jelinek jakub at gcc dot gnu.org 2012-11-13 15:24:19 UTC --- (In reply to comment #10) Please see PR53346, from comment 14 onwards, especially H.J.'s comment: -quote- I was told that cmov wins if branch is mispredicted, otherwise cmov loses. We will investigate if we can improve cmov in GCC. -/quote- Possibly. But then still movsicc etc. isn't automatically the right thing if the comparison is ordered and wrong otherwise, but desirable/undesirable depending on whether the compiler can guess if the condition can be predicated well or not. Guess in MonteCarlo the x*x + y*y = 1.0 condition can't be predicted well and that is why it helps so much.
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 Jan Hubicka hubicka at gcc dot gnu.org changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #12 from Jan Hubicka hubicka at gcc dot gnu.org 2012-11-13 15:54:22 UTC --- The decision on whether to use cmov or jmp was always tricky on x86 architectures. Cmov increase dependency chains, register pressure (both values needs to be loaded in) and has long opcode. So jump sequence, if well predicted, flows better through the out-of-order core. If badly predicted it is, of course, a disaster. I think more modern CPUs solved the problems with long latency of cmov, but the dependency chains are still there. This patch fixes a bug in a pattern rather than tweaks heuristic on predictability. As such I think it is OK for mainline. We should do something about rnflow. I will look into that. The usual wisdom is that lacking profile feedback one should handle non-loop branhces as inpredctable and loop branches as predictable, so all handled by ifconvert fals to the first category. With profile feedback one can see branch probability and if it is close to 0 or REG_BR_PROB_BASE tread the branch as predictable. We handle this with predictable_edge_p parameter passed to BRANCH_COST (that by itself is a gross, but for years we was not able to come with something saner) Honza
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.7.2 |4.7.3 --- Comment #7 from Jakub Jelinek jakub at gcc dot gnu.org 2012-09-20 10:18:32 UTC --- GCC 4.7.2 has been released.
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Keywords||missed-optimization Priority|P3 |P2
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|--- |4.7.2
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 Venkataramanan venkataramanan.kumar at amd dot com changed: What|Removed |Added CC||venkataramanan.kumar at amd ||dot com --- Comment #5 from Venkataramanan venkataramanan.kumar at amd dot com 2012-07-26 15:40:43 UTC --- is this same as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53397
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 --- Comment #6 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-07-26 16:13:33 UTC --- (In reply to comment #5) is this same as http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53397 No. Monto Carlo is independent of FFT. I can confirm the huge drop of the FFT score with -march=amdfam10. (-flto doesn't help in this case.)
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Last reconfirmed||2012-07-24 CC||rguenth at gcc dot gnu.org Known to work||4.6.4 Summary|SciMark Monte Carlo test|[4.7/4.8 Regression] |performance has seriously |SciMark Monte Carlo test |decreased in recent GCC |performance has seriously |releases|decreased in recent GCC ||releases Ever Confirmed|0 |1 Known to fail||4.7.0, 4.8.0 --- Comment #2 from Richard Guenther rguenth at gcc dot gnu.org 2012-07-24 09:22:45 UTC --- Our autotesters have a jump of this magnitude between K8: good r171332, bad r171367 K10: good r171399, bad r171360 IA64: good r182218, bad r182265 needs further bisection, there are a few candidates within the 171399:171360 range. IA64 is supposedly sth else (the fix for PR21617 pops up here). Confirmed at least.
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 Markus Trippelsdorf markus at trippelsdorf dot de changed: What|Removed |Added CC||markus at trippelsdorf dot ||de --- Comment #3 from Markus Trippelsdorf markus at trippelsdorf dot de 2012-07-24 11:29:15 UTC --- -flto helps a lot in this case (CPU=K10): -O3: MonteCarlo: Mflops: 319.57 -O3 -flto: MonteCarlo: Mflops: 921.67
[Bug tree-optimization/54073] [4.7/4.8 Regression] SciMark Monte Carlo test performance has seriously decreased in recent GCC releases
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54073 --- Comment #4 from Richard Guenther rguenth at gcc dot gnu.org 2012-07-24 13:21:25 UTC --- If they are single-file benchmarks a simple -fwhole-program would do, too. (I wonder if we can auto-detect -fwhole-program from within the gcc driver, if one performs non-partial linking on a single source input that should be safe - quite a benchmark thing though).