[Bug tree-optimization/109491] [13 Regression] Segfault in tree-ssa-sccvn.cc:expressions_equal_p()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491 --- Comment #5 from pthaugen at gcc dot gnu.org --- (In reply to Peter Bergner from comment #4) > > Can you git bisect this to find the offending commit? Yes, I was going to start that.
[Bug tree-optimization/109491] [13 Regression] Segfault in tree-ssa-sccvn.cc:expressions_equal_p()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491 --- Comment #1 from pthaugen at gcc dot gnu.org --- Note this only happens on a BE system, compiles fine on LE.
[Bug tree-optimization/109491] New: Segfault in tree-ssa-sccvn.cc:expressions_equal_p()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491 Bug ID: 109491 Summary: Segfault in tree-ssa-sccvn.cc:expressions_equal_p() Product: gcc Version: 13.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: bergner at gcc dot gnu.org, segher at kernel dot crashing.org Target Milestone: --- Host: powerpc64 Target: powerpc64 Build: powerpc64 Created attachment 54845 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54845=edit Reduced testcase Hitting the following segfault on the attached testcase (sorry for size, but it is about 1% of original size). Appears to only happen with GCC 13, compiles fine with GCC 12. ~/install/gcc/trunk/bin/g++ -mcpu=power8 -std=c++14 -S -O2 partial.ii (...misc warnings...) during GIMPLE pass: fre partial.ii: In function ‘void gemm_complex(const DataMapper&, const complex*, const complex*, long int, long int, long int, complex, long int, long int, long int, long int) [with = complex; = complex; = complex; = float; Packet = __vector(4) float; Packetc = Packet2cf; = __vector(4) float; DataMapper = blas_data_mapper; int accRows = 4; int accCols = 4; int ConjugateLhs = 0; int ConjugateRhs = 0; int LhsIsReal = 0; int RhsIsReal = 0]’: partial.ii:1096:6: internal compiler error: Segmentation fault 1096 | void gemm_complex(const DataMapper , const complex *blockAc, | ^~~~ 0x10f6fadb crash_signal /home/pthaugen/src/gcc/trunk/gcc/gcc/toplev.cc:314 0x11222818 expressions_equal_p(tree_node*, tree_node*, bool) /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:6411 0x112229a7 vn_reference_op_eq /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:216 0x11222bfb vn_reference_eq(vn_reference_s const*, vn_reference_s const*) /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:858 0x11243837 vn_reference_hasher::equal(vn_reference_s const*, vn_reference_s const*) /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:250 0x11243837 hash_table::find_slot_with_hash(vn_reference_s* const&, unsigned int, insert_option) /home/pthaugen/src/gcc/trunk/gcc/gcc/hash-table.h:1059 0x1122f43b vn_reference_lookup_2 /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:2336 0x11101b8f walk_non_aliased_vuses(ao_ref*, tree_node*, bool, void* (*)(ao_ref*, tree_node*, void*), void* (*)(ao_ref*, tree_node*, void*, translate_flags*), tree_node* (*)(tree_node*), unsigned int&, void*) /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-alias.cc:3847 0x11233447 vn_reference_lookup(tree_node*, tree_node*, vn_lookup_kind, vn_reference_s**, bool, tree_node**, tree_node*, bool) /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:3967 0x11238cc7 visit_reference_op_load /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:5683 0x11238cc7 visit_stmt /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:6187 0x1123986f process_bb /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:7918 0x1123bcdb do_rpo_vn_1 /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:8518 0x1123db83 execute /home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:8676 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug testsuite/99685] gcc.target/powerpc/divkc3-1.c and mulkc3-1.c fail for 32 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99685 pthaugen at gcc dot gnu.org changed: What|Removed |Added Status|RESOLVED|CLOSED --- Comment #7 from pthaugen at gcc dot gnu.org --- Backports complete.
[Bug testsuite/99685] gcc.target/powerpc/divkc3-1.c and mulkc3-1.c fail for 32 bits
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99685 pthaugen at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #2 from pthaugen at gcc dot gnu.org --- Fixed.
[Bug target/105485] New: ICE: Segmentation fault in pcrel-opt.md:get_insn_name()
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105485 Bug ID: 105485 Summary: ICE: Segmentation fault in pcrel-opt.md:get_insn_name() Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-linux-gnu Target: powerpc64le-linux-gnu Build: powerpc64le-linux-gnu pthaugen@pike:~/temp$ cat err.c template void __builtin_vec_vslv(); typedef __attribute__((altivec(vector__))) char T; void b() { T c, d; __builtin_vec_vslv(c, d); } pthaugen@pike:~/temp$ ~/install/gcc/trunk/bin/g++ -mcpu=power9 -S -O2 err.c during GIMPLE pass: lower err.c: In function ‘void b()’: err.c:3:6: internal compiler error: Segmentation fault 3 | void b() { | ^ 0x10fca4e3 crash_signal /home/pthaugen/src/gcc/trunk/gcc/gcc/toplev.cc:322 0x11cd9a10 get_insn_name(int) /home/pthaugen/src/gcc/trunk/gcc/gcc/config/rs6000/pcrel-opt.md:134798 0x11635f87 rs6000_gimple_fold_builtin(gimple_stmt_iterator*) /home/pthaugen/src/gcc/trunk/gcc/gcc/config/rs6000/rs6000-builtin.cc:1301 0x10b3c017 gimple_fold_call /home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-fold.cc:5559 0x10b3e00b fold_stmt_1 /home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-fold.cc:6298 0x11e5ce17 lower_stmt /home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:390 0x11e5ce17 lower_sequence /home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:217 0x11e5bf03 lower_gimple_bind /home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:475 0x11e5dc33 lower_function_body /home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:110 0x11e5dc33 execute /home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:195 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. pthaugen@pike:~/temp$ ~/install/gcc/trunk/bin/g++ -v Using built-in specs. COLLECT_GCC=/home/pthaugen/install/gcc/trunk/bin/g++ COLLECT_LTO_WRAPPER=/home/pthaugen/install/gcc/trunk/libexec/gcc/powerpc64le-unknown-linux-gnu/13.0.0/lto-wrapper Target: powerpc64le-unknown-linux-gnu Configured with: /home/pthaugen/src/gcc/trunk/gcc/configure --prefix=/home/pthaugen/install/gcc/trunk --enable-decimal-float --enable-lto --with-as=/usr/bin/as --with-ld=/usr/bin/ld --enable-languages=c,fortran,c++ --disable-multilib --disable-libsanitizer --with-cpu=power8 --disable-bootstrap Thread model: posix Supported LTO compression algorithms: zlib gcc version 13.0.0 20220504 (experimental) [master r13-118-g4a206161072] (GCC)
[Bug testsuite/100407] New test cases attr-retain-*.c fail after their introduction in r11-7284
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100407 pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org Status|WAITING |RESOLVED Resolution|--- |FIXED --- Comment #15 from pthaugen at gcc dot gnu.org --- Fixed.
[Bug rtl-optimization/68212] Loop unroller breaks basic block frequencies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212 --- Comment #9 from pthaugen at gcc dot gnu.org --- The problem can be seen in the loop2_unroll dump: pthaugen@pike:~/temp/pr68212$ grep "Invalid sum of" simple.c.272r.loop2_unroll ;; Invalid sum of incoming counts 285685646 (estimated locally), should be 212627725 (estimated locally) ;; Invalid sum of incoming counts 32061393 (estimated locally), should be 105119324 (estimated locally)
[Bug rtl-optimization/68212] Loop unroller breaks basic block frequencies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212 pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||guojiufu at gcc dot gnu.org, ||pthaugen at gcc dot gnu.org --- Comment #8 from pthaugen at gcc dot gnu.org --- (In reply to Peter Bergner from comment #7) > (In reply to Pat Haugen from comment #4) > > Author: pthaugen > > Date: Fri Oct 14 17:10:18 2016 > > New Revision: 241170 > > > > URL: https://gcc.gnu.org/viewcvs?rev=241170=gcc=rev > > Log: > > PR rtl-optimization/68212 > > * cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge > > frequency when computing scale factor for peeled copies. > > * loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count > > values for switch/peel blocks/edges. > > Repeating Martin's question. Pat, is this PR fixed with your patch or is > there more to do? No, there are still problems. The patch noted fixed the count/probability for the peeled switch/case blocks created before entering the unrolled loop. But the counts for the loop header/exit blocks are still incorrect. The last activity I know of concerning that problem was the patch by Jiufu Guo here: https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539594.html. Not sure if he has any more input here.
[Bug target/65010] ppc backend generates unnecessary signed extension
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65010 pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #11 from pthaugen at gcc dot gnu.org --- Another example to clean up. The back to back constant load/sign extend sequence of rtl insns is created in each block by the block reordering pass (.bbo) duplicating the common return block. int foo(int in) { if (in == 1) return 123; return 0; } foo: .LFB0: .cfi_startproc cmpwi 0,3,1 beq 0,.L5 li 3,0 extsw 3,3 blr .p2align 4,,15 .L5: li 3,123 extsw 3,3 blr
[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 --- Comment #34 from pthaugen at gcc dot gnu.org --- (In reply to pthaugen from comment #33) > > I tried the patch on a Power9 system. Execution time went from 371 seconds > to 291. Which I should have included is in line, or even slightly better, than the 2 patches posted by Tamar.
[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 --- Comment #33 from pthaugen at gcc dot gnu.org --- (In reply to rsand...@gcc.gnu.org from comment #32) > Created attachment 52102 [details] > Alternative patch > > This patch is a squash of several ira tweaks that together recover the > pre-GCC11 exchange2 performance on aarch64. It isn't ready for trunk > yet (hence lack of comments and changelog). It would be great to hear > whether/how it works on other targets though. I tried the patch on a Power9 system. Execution time went from 371 seconds to 291.
[Bug ipa/103734] IPA-CP opportunity for imagick in SPECCPU 2017
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734 pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #3 from pthaugen at gcc dot gnu.org --- (In reply to Hongtao.liu from comment #2) > (In reply to Tamar Christina from comment #0) > > When using --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=20 on > > imagick the hot functions MorphologyApply and GetVirtualPixelsFromNexus get > > replaced by specialized versions that are much smaller and faster. > > > > Some other benchmarks like leela also get very small uplifts but the imagick > > one is worth 14%. Both flags seem to be needed. > > Observe similar thing on ICX with -param=inline-min-speedup=3 I tested on a Power9 system and see the following improvements: --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=20 : +5% -param=inline-min-speedup=3 : +30%
[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743 --- Comment #2 from pthaugen at gcc dot gnu.org --- (In reply to Peter Bergner from comment #1) > Pat, does the patch from Alan you're working to get committed help with this > test case? No, it just loads the constant slightly different: li 9,1 rotldi 9,9,63 cmpd 0,3,9
[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 --- Comment #21 from pthaugen at gcc dot gnu.org --- (In reply to Jan Hubicka from comment #20) > With g:r12-5872-gf157c5362b4844f7676cae2aba81a4cf75bd68d5 we should no > longer need -fno-inline-functions-called-once Yes, I see that now with an updated trunk.
[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782 --- Comment #19 from pthaugen at gcc dot gnu.org --- I tried -fno-inline-functions-called-once and the patches on a Power9 system. Following are the run times and spill counts (grep -c Spilling exchange2.fppized.f90.298r.ira). Interesting that the spill counts increase when -fno-inline-functions-called-once is added, but obviously that additional spill is on cold paths since execution time improves. Compiler options used are "-O3 -mcpu=power9 -fpeel-loops -funroll-loops -ffast-math". time(sec) Spill base473 3284 no-inline-functions-called-once 370 5410 patches 1 & 2 397461 patches 1 & 2 + no-inline-functions-called-once 299870
[Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783 pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #2 from pthaugen at gcc dot gnu.org --- I’ll note that an inline asm stmt appears to be a barrier for the scheduler, but apparently not for other parts of the compiler. For example on the following code: double d; void foo(double *dp, double c) { double e; e = c + d; asm volatile (""); *dp = e + d; return; } The scheduling dumps show that the asm volatile has dependencies on all insns before and after it. But that doesn’t really help because the first addition stmt gets moved past the asm volatile at expand time.
[Bug ipa/96825] [11 Regression] Commit r11-2645 degrades CPU2017 548.exchange2_r by 35%
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825 --- Comment #6 from pthaugen at gcc dot gnu.org --- (In reply to Richard Biener from comment #4) > I believe there have been improvements recently - can you re-assess the > magnitude of the problem? The corresponding ARM PR got re-targeted to GCC > 12 (for a RA fix), I think Martin has improved the IPA CP parts, maybe not > fully though. There has been no improvement seen on Power since the degradation appeared.
[Bug target/99133] Power10 xxspltiw, xxspltidp, xxsplti32dx instructions need to be marked as prefixed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99133 pthaugen at gcc dot gnu.org changed: What|Removed |Added Resolution|--- |FIXED Status|ASSIGNED|RESOLVED --- Comment #4 from pthaugen at gcc dot gnu.org --- Fixed.
[Bug target/99133] Power10 xxspltiw, xxspltidp, xxsplti32dx instructions need to be marked as prefixed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99133 pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #2 from pthaugen at gcc dot gnu.org --- I submitted a prefix cleanup patch back in Dec. that also took care of this https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561591.html. It's still waiting review.
[Bug other/96825] New: Commit r11-2645 degrades CPU2017 548.exchange2_r by 35%
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825 Bug ID: 96825 Summary: Commit r11-2645 degrades CPU2017 548.exchange2_r by 35% Product: gcc Version: 11.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: bergner at gcc dot gnu.org, hubicka at gcc dot gnu.org, segher at gcc dot gnu.org, seurer at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu The given commit (1118a3ff9d3ad6a64bba25dc01e7703325e23d92) causes a 35% degradation for exchange2_r on Power9 built with the options "-O2 -mcpu=power9". Switching to -O3 results in a 44% degradation. The degradation occurs in __brute_force_MOD_digits_2().
[Bug tree-optimization/50439] gfortran infinite loop with -floop-interchange
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50439 pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #12 from pthaugen at gcc dot gnu.org --- I can no longer produce the condition either, with the reduced testcase or 416.gamess. So if you think the correct thing to do is close this bug I'm fine with that.
[Bug lto/92600] New: ICE: lto1: internal compiler error: symtab_node::verify failed, building 523.xalancbmk_r with -flto -fno-inline
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92600 Bug ID: 92600 Summary: ICE: lto1: internal compiler error: symtab_node::verify failed, building 523.xalancbmk_r with -flto -fno-inline Product: gcc Version: 10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, marxin at gcc dot gnu.org, segher at kernel dot crashing.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu I'm seeing the following ICE when building CPU2017 523.xalancbmk_r with the options "-O2 -mcpu=power8 -flto -fno-inline". The errors are emitted during the link step. AttributeListImpl.cpp:246:8: warning: type 'struct NameCompareFunctor' violates the C++ One Definition Rule [-Wodr] 246 | struct NameCompareFunctor |^ AttributesImpl.cpp:266:8: note: a different type is defined in another translation unit 266 | struct NameCompareFunctor |^ AttributeListImpl.cpp:261:21: note: the first difference of corresponding definitions is field 'm_name' 261 | const XMLCh* const m_name; | ^ AttributesImpl.cpp:281:21: note: a field with different name is defined in another translation unit 281 | const XMLCh* const m_qname; | ^ AttributeListImpl.cpp:246:8: note: type 'struct NameCompareFunctor' itself violates the C++ One Definition Rule 246 | struct NameCompareFunctor |^ AttributesImpl.cpp:266:8: note: the incompatible type is defined here 266 | struct NameCompareFunctor |^ lto1: error: Two symbols with same comdat_group are not linked by the same_comdat_group list. _ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv/871705 (resetEntities) @0x71d34b022ec0 Type: function definition analyzed Visibility: externally_visible undef public weak comdat comdat_group:_ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv one_only section:.text._ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv (implicit_section) virtual Address is taken. References: Referring: *.LTHUNK8/459137 (alias)_ZTVN11xercesc_2_715XercesDOMParserE/1291254 (addr)_ZTVN11xercesc_2_712XSDDOMParserE/872502 (addr) Read from file: XSDDOMParser.o Function flags: count:1073741824 (estimated locally) merged_comdat Called by: Calls: _ZThn16_N11xercesc_2_715XercesDOMParser13resetEntitiesEv/459138 (_ZThn16_N11xercesc_2_715XercesDOMParser13resetEntitiesEv) @0x71d349e9ef40 Type: function definition analyzed Visibility: externally_visible prevailing_def_ironly public weak comdat comdat_group:_ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv one_only section:.text._ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv (implicit_section) virtual artificial Same comdat group as: *.LTHUNK8/459137 Address is taken. References: Referring: _ZTVN11xercesc_2_712XSDDOMParserE/872502 (addr)_ZTVN11xercesc_2_715XercesDOMParserE/1291254 (addr) Read from file: IGXMLScanner2.o Function flags: calls_comdat_local merged_comdat indirect_call_target Thunk fixed offset -16 virtual value 0 indirect_offset 0 has virtual offset 0 Called by: Calls: *.LTHUNK8/459137 (can throw external) during IPA pass: pure-const lto1: internal compiler error: symtab_node::verify failed 0x102ab7df symtab_node::verify_symtab_nodes() /home/pthaugen/src/gcc/trunk/gcc/gcc/symtab.c:1310 0x10654a23 symtab_node::checking_verify_symtab_nodes() /home/pthaugen/src/gcc/trunk/gcc/gcc/cgraph.h:648 0x10654a23 symbol_table::remove_unreachable_nodes(_IO_FILE*) /home/pthaugen/src/gcc/trunk/gcc/gcc/ipa.c:667 0x101f26f7 read_cgraph_and_symbols(unsigned int, char const**) /home/pthaugen/src/gcc/trunk/gcc/gcc/lto/lto-common.c:2910 0x101c3bcb lto_main() /home/pthaugen/src/gcc/trunk/gcc/gcc/lto/lto.c:629 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. lto-wrapper: fatal error: /home/pthaugen/install/gcc/trunk/bin/g++ returned 1 exit status compilation terminated. /usr/bin/ld: error: lto-wrapper failed collect2: error: ld returned 1 exit status I thought this might be a possible dup of PR91241 or PR89605 but was unable to reproduce with the GCC 8 or 9 compilers. The build succeeds with trunk if I change the optimization level to -O1 or remove -fno-inline.
[Bug rtl-optimization/90813] [10 regression] gfortran.dg/proc_ptr_51.f90 fails (SIGSEGV) after 272084
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90813 pthaugen at gcc dot gnu.org changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #20 from pthaugen at gcc dot gnu.org --- (In reply to Segher Boessenkool from comment #17) > sched2 swaps the two insns (37 and 40 for me -- use -dp to see the numbers > in your .s file, use -da if you want lots of dumps, -dap together). > > So why did sched2 decide it can swap these? They are in the same aliasing > set, so it shouldn't do this. Hrm. I'm looking into this...
[Bug target/84369] test case gcc.dg/sms-10.c fails on power9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369 --- Comment #7 from pthaugen at gcc dot gnu.org --- Author: pthaugen Date: Fri Apr 19 17:14:57 2019 New Revision: 270461 URL: https://gcc.gnu.org/viewcvs?rev=270461=gcc=rev Log: Backport from mainline: 2019-04-16 Pat Haugen PR target/84369 * config/rs6000/power9.md: Add store forwarding bypass. Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/rs6000/power9.md
[Bug target/84369] test case gcc.dg/sms-10.c fails on power9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369 --- Comment #5 from pthaugen at gcc dot gnu.org --- Author: pthaugen Date: Tue Apr 16 15:58:02 2019 New Revision: 270394 URL: https://gcc.gnu.org/viewcvs?rev=270394=gcc=rev Log: PR target/84369 * config/rs6000/power9.md: Add store forwarding bypass. Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/power9.md
[Bug rtl-optimization/89154] 5% degradation of CPU2006 473.astar starting with r266305
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89154 --- Comment #3 from Pat Haugen --- (In reply to Segher Boessenkool from comment #1) > The new version needs to save r4 because it reuses the reg for storing r7+r8. > And we still don't wrap CR separately, sigh. Yes, and similar for r3 since it's reused in the block. Another thing that could be moved is the r1 adjustment, is that also a component that isn't handled separately?
[Bug tree-optimization/89154] New: 5% degradation of CPU2006 473.astar starting with r266305
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89154 Bug ID: 89154 Summary: 5% degradation of CPU2006 473.astar starting with r266305 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, rguenth at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu Not sure if this is really tree-optimization issue, just picked as initial component since fix dealt with that. Could possibly be rtl-optimization/shrink-wrap issue brought about by additional register pressure due to CSE'ing/hoisting some additional code. Funtion way2obj::releasepoint() degrades 20% starting with r266305. Looking at perf output, the main difference seems to be that we're no longer shrink-wrapping the early exit test at the start of the function. Following is the annotated assembly of the start of the function. r266304: 10006a40 <_ZN7way2obj12releasepointEii>: /* way2obj::releasepoint(int, int) total: 2032811 22.9279 */ :10006a40: lis r2,4098 :10006a44: addir2,r2,32512 95384 1.0758 :10006a48: lwz r9,4424(r3) :10006a4c: ld r8,8(r3) 119001 1.3422 :10006a50: lhz r7,16(r3) 1 1.1e-05 :10006a54: mullw r9,r9,r5 :10006a58: add r9,r9,r4 :10006a5c: extsw r9,r9 169526 1.9121 :10006a60: rldicr r9,r9,2,61 :10006a64: lhzxr10,r8,r9 21865 0.2466 :10006a68: cmpwr10,r7 :10006a6c: beqlr r266305: 10006a40 <_ZN7way2obj12releasepointEii>: /* way2obj::releasepoint(int, int) total: 2440798 26.2354 */ :10006a40: lis r2,4098 :10006a44: addir2,r2,32512 35498 0.3816 :10006a48: lwa r6,4424(r3) :10006a4c: ld r7,8(r3) 26361 0.2833 :10006a50: std r30,-16(r1) :10006a54: mr r30,r3 157660 1.6946 :10006a58: mfcrr12 162000 1.7413 :10006a5c: lhz r3,16(r3) 17 1.8e-04 :10006a60: std r23,-72(r1) 139 0.0015 :10006a64: mr r23,r4 2 2.1e-05 :10006a68: mullw r9,r6,r5 59 6.3e-04 :10006a6c: stw r12,8(r1) 244832 2.6316 :10006a70: stdur1,-112(r1) 4 4.3e-05 :10006a74: add r9,r9,r4 5 5.4e-05 :10006a78: extsw r9,r9 201 0.0022 :10006a7c: rldicr r8,r9,2,61 343 0.0037 :10006a80: add r4,r7,r8 9 9.7e-05 :10006a84: lhzxr10,r7,r8 151595 1.6294 :10006a88: cmpwr10,r3 :10006a8c: beq 10006c64 <_ZN7way2obj12releasepointEii+0x224> The target of the conditional branch in the slow version is just the epilogue code to restore R1, R23, R30 and CR3/CR4 and return.
[Bug ipa/85103] [8/9 Regression] Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 --- Comment #21 from Pat Haugen --- > Knowing what inline decision matters for VPR, I can try to fix it too. Gathering some perf data, the hot functions for various revisions are as follows. All other functions report < 0.5% of execution time. r257581 --- samples %image name symbol name 577871 57.8700 vpr_base.temp_32 try_route 402207 40.2784 vpr_base.temp_32 get_heap_head r257582 --- samples %image name symbol name 428249 40.9911 vpr_base.pat_test_32 try_route 402768 38.5521 vpr_base.pat_test_32 get_heap_head 189358 18.1249 vpr_base.pat_test_32 node_to_heap.part.0 r267727 (after patches that fixed bzip2 went in) --- samples %image name symbol name 493998 45.9797 vpr_base.pat_base_32 try_route 416389 38.7561 vpr_base.pat_base_32 get_heap_head 140727 13.0984 vpr_base.pat_base_32 add_to_heap So from the above we can see that r257582 stopped inlining node_to_heap() into try_route(). In r267727, node_to_heap() is again being inlined into try_route(), but add_to_heap() is no longer inlined into node_to_heap(), which is the only caller of add_to_heap(). So it appears the needed inlining is getting the chain node_to_heap()->add_to_heap() to both get inlined into try_route again.
[Bug ipa/85103] [8/9 Regression] Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 --- Comment #19 from Pat Haugen --- (In reply to Jan Hubicka from comment #18) > which makes it to be inlined. Does it solve the perofmrance problem for both > benchmarks? Looking at our nightly spec runs, the bzip2 degradation has indeed been cleaned up. But it looks like 175.vpr degraded another 2% or so over the last couple days.
[Bug ipa/85103] [8/9 Regression] Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 --- Comment #16 from Pat Haugen --- > > Do you observe the same slowdown if you restore either of the params to > the value before the r257582 change? > --param max-inline-insns-auto=40 results in the same degradation. --param inline-min-speedup=8 results in even more degratation (an additional 12% over r257582).
[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698 --- Comment #7 from Pat Haugen --- I also see the loop now being aligned when I apply your patch. srdi 10,10,2 mtctr 10 .p2align 4,,15 .L6: ld 9,0(11) ld 8,0(4)
[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698 Pat Haugen changed: What|Removed |Added Status|RESOLVED|REOPENED Resolution|FIXED |--- --- Comment #5 from Pat Haugen --- It's still not fixed in current trunk. After unrolling maybe_hot_bb_p() returns false (via maybe_hot_count_p()), which prevents aligning the label in final.c:compute_alignments(). Here's the tail section of debug session and partial backtrace to show. maybe_hot_count_p (fun=0x759f, count=...) at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:185 185 return (count.to_gcov_type () >= get_hot_bb_threshold ()); (gdb) p count.to_gcov_type () $3 = 25 (gdb) p get_hot_bb_threshold () $4 = 100 (gdb) bt #0 maybe_hot_count_p (fun=0x759f, count=...) at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:185 #1 0x10d8fdb0 in maybe_hot_bb_p (fun=0x759f, bb=0x759801a0) at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:195 #2 0x10d9045c in optimize_bb_for_size_p (bb=0x759801a0) at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:301 #3 0x108c7234 in compute_alignments () at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/final.c:674 #4 0x108c7d3c in (anonymous namespace)::pass_compute_alignments::execute (this=0x12886200) at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/final.c:823
[Bug rtl-optimization/86892] New: RTL CSE commoning trivial constants across call and/or too early
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86892 Bug ID: 86892 Summary: RTL CSE commoning trivial constants across call and/or too early Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, jakub at gcc dot gnu.org, rsandifo at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu For the following testcase cse.c will common the constant 0 across the call which then requires the use of a non-volatile register (and prologue/epilogue save/restore). void bar(); int a, b; void foo() { a = 0; bar(); b = 0; } I have also observed a situation where early cse of a constant prevented some combine transformations from occurring because the register's lifetime had been extended. The feeling is that cse of trivial constants should not be done so early in the pass schedule and should not be done across calls at all.
[Bug target/86612] __strdup problem on power 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86612 Pat Haugen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||pthaugen at gcc dot gnu.org Resolution|--- |FIXED --- Comment #3 from Pat Haugen --- Was really a library difference, with newer glibc no longer declaring __strdup. Fixed.
[Bug target/86612] __strdup problem on power 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86612 --- Comment #2 from Pat Haugen --- Author: pthaugen Date: Thu Jul 26 20:47:37 2018 New Revision: 263021 URL: https://gcc.gnu.org/viewcvs?rev=263021=gcc=rev Log: PR target/86612 * gcc.target/powerpc/pr58673-2.c: Call strdup. Modified: branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/pr58673-2.c
[Bug target/86612] __strdup problem on power 9
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86612 --- Comment #1 from Pat Haugen --- Author: pthaugen Date: Thu Jul 26 20:41:25 2018 New Revision: 263020 URL: https://gcc.gnu.org/viewcvs?rev=263020=gcc=rev Log: PR target/86612 * gcc.target/powerpc/pr58673-2.c: Call strdup. Modified: trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.target/powerpc/pr58673-2.c
[Bug tree-optimization/86489] ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489 Pat Haugen changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED --- Comment #8 from Pat Haugen --- Fixed, thanks.
[Bug tree-optimization/86489] ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489 --- Comment #5 from Pat Haugen --- (In reply to kugan from comment #3) > index f6fa2f7..fbdf838 100644 > --- a/gcc/tree-ssa-loop-niter.c > +++ b/gcc/tree-ssa-loop-niter.c > @@ -2555,6 +2555,7 @@ number_of_iterations_popcount (loop_p loop, edge exit, > ... = PHI . */ >gimple *phi = SSA_NAME_DEF_STMT (b_11); >if (gimple_code (phi) != GIMPLE_PHI > + || (gimple_bb (phi) != loop_latch_edge (loop)->dest) >|| (gimple_assign_lhs (and_stmt) > != gimple_phi_arg_def (phi, loop_latch_edge (loop)->dest_idx))) > return false; This fixes the problem for me.
[Bug tree-optimization/86489] New: ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489 Bug ID: 86489 Summary: ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, kugan at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu The patch for pr82479 causes an ICE while building CPU2017 531.deepsjeng_r with FDO and LTO. The ICE occurs during the link step of the -fprofile-use build. /home/pthaugen/install/gcc/gcc_hunt/bin/g++ -m64 -O3 -mcpu=power9 -fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip -flto -DSPEC_LP64 -m64 -Wl,-q -Wl,-rpath=/home/pthaugen/install/gcc/gcc_hunt/lib64 attacks.o bitboard.o bits.o board.o draw.o endgame.o epd.o generate.o initp.o make.o moves.o neval.o pawn.o preproc.o search.o see.o sjeng.o state.o ttable.o utils.o -o deepsjeng_r during GIMPLE pass: cunroll generate.cpp: In function 'gen.constprop': generate.cpp:159:5: internal compiler error: in gimple_phi_arg, at gimple.h:4345 int gen(state_t *s, move_s *moves) { ^ 0x1013c597 gimple_phi_arg /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/gimple.h:4345 0x1013c5f3 gimple_phi_arg /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/gimple.h:4345 0x1013c5f3 gimple_phi_arg /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/gimple.h:4353 0x10a37607 gimple_phi_arg_def /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/gimple.h:4396 0x10a37607 number_of_iterations_popcount /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:2559 0x10a37607 number_of_iterations_exit_assumptions(loop*, edge_def*, tree_niter_desc*, gcond**, bool) /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:2364 0x10a392eb number_of_iterations_exit_assumptions(loop*, edge_def*, tree_niter_desc*, gcond**, bool) /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:2611 0x10a392eb number_of_iterations_exit(loop*, edge_def*, tree_niter_desc*, bool, bool) /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:2616 0x10a3985f number_of_iterations_exit(loop*, edge_def*, tree_niter_desc*, bool, bool) /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/vec.h:884 0x10a3985f estimate_numbers_of_iterations(loop*) /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:4100 0x10a3ce73 estimate_numbers_of_iterations(function*) /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:4329 0x10a07ec7 tree_unroll_loops_completely /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-ivcanon.c:1452 0x10a08603 execute /home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-ivcanon.c:1612 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report.
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #13 from Pat Haugen --- Author: pthaugen Date: Mon May 21 16:41:09 2018 New Revision: 260477 URL: https://gcc.gnu.org/viewcvs?rev=260477=gcc=rev Log: PR target/85698 * gcc.target/powerpc/vec-setup-be-long.c: Remove XFAIL. Modified: branches/gcc-8-branch/gcc/testsuite/ChangeLog branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #12 from Pat Haugen --- Author: pthaugen Date: Mon May 21 16:34:44 2018 New Revision: 260476 URL: https://gcc.gnu.org/viewcvs?rev=260476=gcc=rev Log: PR target/85698 * config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest operand. * gcc.target/powerpc/pr85698.c: New test. Added: branches/gcc-7-branch/gcc/testsuite/gcc.target/powerpc/pr85698.c Modified: branches/gcc-7-branch/gcc/ChangeLog branches/gcc-7-branch/gcc/config/rs6000/rs6000.c branches/gcc-7-branch/gcc/testsuite/ChangeLog
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #11 from Pat Haugen --- Author: pthaugen Date: Mon May 21 16:23:20 2018 New Revision: 260475 URL: https://gcc.gnu.org/viewcvs?rev=260475=gcc=rev Log: PR target/85698 * config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest operand. * gcc.target/powerpc/pr85698.c: New test. Added: branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/pr85698.c Modified: branches/gcc-8-branch/gcc/ChangeLog branches/gcc-8-branch/gcc/config/rs6000/rs6000.c branches/gcc-8-branch/gcc/testsuite/ChangeLog
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 Pat Haugen changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #10 from Pat Haugen --- Fixed.
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #9 from Pat Haugen --- Author: pthaugen Date: Thu May 17 16:19:16 2018 New Revision: 260329 URL: https://gcc.gnu.org/viewcvs?rev=260329=gcc=rev Log: PR target/85698 * config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest operand. * gcc.target/powerpc/pr85698.c: New test. Added: trunk/gcc/testsuite/gcc.target/powerpc/pr85698.c Modified: trunk/gcc/ChangeLog trunk/gcc/config/rs6000/rs6000.c trunk/gcc/testsuite/ChangeLog
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #7 from Pat Haugen --- So the problem is that we're generating a stxvw4x insn to write to memory, which corrupts the contents due to both endian behavior and element size (since we're dealing with halfword/uint16_t elements). Value in vector reg = 0x0002fffc0002fff5000e stvx/good (gdb) x/8hx $r1+$r8 0x7fffe490: 0x000e 0xfff5 0x0002 0x 0xfffc 0x0002 0x 0x stxvw4x/bad (gdb) x/8hx $r7+$r8 0x7fffe470: 0x 0x 0xfffc 0x0002 0x0002 0x 0x000e 0xfff5
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #6 from Pat Haugen --- (In reply to Richard Biener from comment #4) > I can see what the patch does to this testcase on x86_64 - it enables BB > vectorization of the first two loops after runrolling. I don't see anything > suspicious here on x86_64 and 525.x264_r works fine for me. > > Can you claify whether test, ref or train inputs fail for you? I tried > AVX256, AVX128 and plain old SSE sofar without any issue but ref takes some > time... > > Can you check whether the following reduced file produces the same assembly > for add4x4_idct as in the complete benchmark? If so it should be possible to > generate a runtime testcase from it. Please attach preprocessed source if > that doesn't work out. > > Sofar I do suspect we are hitting a latent target issue? > > #include > static uint8_t x264_clip_uint8( int x ) > { > return x&(~255) ? (-x)>>31 : x; > } > void add4x4_idct( uint8_t *p_dst, int16_t dct[16]) > { > int16_t d[16]; > int16_t tmp[16]; > for( int i = 0; i < 4; i++ ) > { > int s02 = dct[0*4+i] + dct[2*4+i]; > int d02 = dct[0*4+i] - dct[2*4+i]; > int s13 = dct[1*4+i] + (dct[3*4+i]>>1); > int d13 = (dct[1*4+i]>>1) - dct[3*4+i]; > tmp[i*4+0] = s02 + s13; > tmp[i*4+1] = d02 + d13; > tmp[i*4+2] = d02 - d13; > tmp[i*4+3] = s02 - s13; > } > for( int i = 0; i < 4; i++ ) > { > int s02 = tmp[0*4+i] + tmp[2*4+i]; > int d02 = tmp[0*4+i] - tmp[2*4+i]; > int s13 = tmp[1*4+i] + (tmp[3*4+i]>>1); > int d13 = (tmp[1*4+i]>>1) - tmp[3*4+i]; > d[0*4+i] = ( s02 + s13 + 32 ) >> 6; > d[1*4+i] = ( d02 + d13 + 32 ) >> 6; > d[2*4+i] = ( d02 - d13 + 32 ) >> 6; > d[3*4+i] = ( s02 - s13 + 32 ) >> 6; > } > for( int y = 0; y < 4; y++ ) > { > for( int x = 0; x < 4; x++ ) > p_dst[x] = x264_clip_uint8( p_dst[x] + d[y*4+x] ); > p_dst += 32; > } > } Yes, that produces similar code, and adding the following to it produces an executable test that fails at -O3. void main() { uint8_t dst[128]; int16_t dct[16]; int i; for (i = 0; i < 16; i++) dct[i] = i*10 + i; for (i = 0; i < 128; i++) dst[i] = i; add4x4_idct(dst, dct); if (dst[0] != 14 || dst[1] != 0 || dst[2] != 4 || dst[3] != 2 || dst[32] != 28 || dst[33] != 35 || dst[34] != 33 || dst[35] != 35) abort(); } Continuing to debug further...
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #5 from Pat Haugen --- (In reply to Richard Biener from comment #4) > > Can you claify whether test, ref or train inputs fail for you? I tried > AVX256, AVX128 and plain old SSE sofar without any issue but ref takes some > time... > I see the error for ref and test inputs. The train input appears to pass, but then the FDO optimized version fails with the ref input also. I will keep looking at the other stuff you requested.
[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #3 from Pat Haugen --- (In reply to Richard Biener from comment #2) > > Can you help me with isolating this to a single function inside that file? > Maybe try sticking __attribute__((optimize("no-tree-vectorize"))) on some > functions. Oh, there's also the vect_loop debug counter > (-fdbg-cnt=vect_loop:N). add4x4_idct() looks like the function, adding the attribute (or "no-tree-slp-vectorize") to it resulted in a successful run. > Otherwise I'll have to find a power8 machine where I can set up CPU 2017 > myself (unlikely this week due to public holidays). Note that it also fails with -mcpu=power7, so a power8 machine is not needed.
[Bug tree-optimization/85698] CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 --- Comment #1 from Pat Haugen --- Looks like benchmark fails when x264_src/common/dct.c is compiled with r257581.
[Bug tree-optimization/85698] New: CPU2017 525.x264_r fails starting with r257581
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698 Bug ID: 85698 Summary: CPU2017 525.x264_r fails starting with r257581 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: rguenth at gcc dot gnu.org, segher at kernel dot crashing.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu Benchmark miscompares starting with given revision. Options used for building the benchmark are "-O3 -mcpu=power8". I did discover that adding -funroll-loops changes behavior such that the benchmark passes. Continuing to see if I can narrow down to a specific file that's miscompiled...
[Bug c++/85600] [9 Regression] CPU2006 471.omnetpp fails starting with r259771
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85600 Pat Haugen changed: What|Removed |Added Known to work||8.0 Summary|CPU2006 471.omnetpp fails |[9 Regression] CPU2006 |starting with r259771 |471.omnetpp fails starting ||with r259771 Known to fail||9.0 --- Comment #3 from Pat Haugen --- Benchmark fails same way with no optimization.
[Bug c++/85600] CPU2006 471.omnetpp fails starting with r259771
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85600 --- Comment #2 from Pat Haugen --- (In reply to Andrew Pinski from comment #1) > Does adding -fno-lifetime-dse help? This could be a bug in the omnetpp > sources ... Nope, still fails. 471.omnetpp: copy 0 non-zero return code (exit code=1, signal=0)
[Bug c++/85600] New: CPU2006 471.omnetpp fails starting with r259771
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85600 Bug ID: 85600 Summary: CPU2006 471.omnetpp fails starting with r259771 Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org Target Milestone: --- Benchmark is failing at runtime, emitting following message at the end before exiting with rc=1. ** Event #0 T=0.000 ( 0.00s) Messages: created: 77472 ** Event #500 T=0.0868274600 ( 86ms) Messages: created: 3949482 ** Event #1000 T=0.1605411650 (160ms) Messages: created: 7854099 Error in module largeNet.llanBB[48].bhost[3].mac: (cQueue)largeNet.llanBB[48].bhost[3].mac.class-members.outputBuffer: pop(): queue empty. End run of OMNeT++
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #20 from Pat Haugen --- (In reply to Richard Biener from comment #18) > Fixed (hopefully). Yes, mgrid performance is back. Thanks.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Pat Haugen changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #15 from Pat Haugen --- Richard, concerning my prior comment, any thoughts if this is a similar issue to what you fixed in pr55334?
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #14 from Pat Haugen --- Created attachment 43928 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43928=edit r256888 pcom dump So the difference appears to be occurring in predictive commoning. In the ipa-cp clone, resid.constprop, pcom is failing to hoist some loads/expressions from the vectorized loop. This results in an additional 9 vector loads and 5 vector adds being executed each iteration of the loop. I've attached a pcom dump of the original resid() and the clone resid.constprop(). You can see that in the original resid(), pcom is moving some loads/adds, but not in resid.constprop(). BB 6 is the vectorized loop in resid(), BB 5 is the same loop in resid.constprop(). Not sure if this is a similar issue to pr55334 wrt losing restrict.
[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 --- Comment #7 from Pat Haugen --- Created attachment 43901 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43901=edit inline dump Prior attachment was r257581 dump. This is r257582 dump.
[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 --- Comment #6 from Pat Haugen --- Created attachment 43900 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43900=edit inline dump
[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 --- Comment #5 from Pat Haugen --- A little more detail. 48t.fnsplit splits mainGtU() into 2 functions: mainGtU(): which contains a few early exit tests and then a call to mainGtU.part.0() mainGtU.part.0(): contains the remainder of mainGtU(), including the loop Following is then the behavior in 79i.inline: r257581: The 3 mainGtU() calls are inlined into their caller mainSimpleSort(), and the mainGtU.part.0() calls remain. r257582: mainGtU.part.0() is inlined back into mainGtU(), the first mainGtU() call in mainSimpleSort() is inlined but the remaining 2 mainGtU() calls remain.
[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 --- Comment #3 from Pat Haugen --- (In reply to Jan Hubicka from comment #1) > Pat, can you try to figure out what value of min-speedup is neeed to recover > from this regression? Using r257582, either of the following options restores the behavior of not inlining the mainGtU call and eliminates the performance regression. --param inline-min-speedup=18 --param max-inline-insns-auto=24
[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 --- Comment #2 from Pat Haugen --- (In reply to Pat Haugen from comment #0) > > Very initial look at profile of bzip2 shows degradation is contained to > mainSort(), which showed a 54% increase in run cycles. Appears one of the > calls to mainGtU() is inlined into mainSort in the slow version, but the > drop in cycle counts on mainGtu is no where close to the increase on > mainSort. Appears the inlined copy of mainGtU() creates additional register pressure which results in register spill being generated in the loop of the inlined copy. The non-inlined copy of the loop is approx. 125 generated insns, whereas the inlined copy is about 215 insns (90 spill references).
[Bug middle-end/83665] [8 regression] Big code size regression and some code quality improvement at Jan 2 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83665 --- Comment #18 from Pat Haugen --- (In reply to Richard Biener from comment #17) > Pat, please open a new bug for the regression caused by the fix. Done, pr85103.
[Bug ipa/85103] New: Performance regressions on SPEC with r257582
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103 Bug ID: 85103 Summary: Performance regressions on SPEC with r257582 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org, segher at kernel dot crashing.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu r257582 is responsible for a 6% degradation in CPU2000 175.vpr and a 12% degradation in CPU2006 401.bzip2. Both run on a Power7 box. Very initial look at profile of bzip2 shows degradation is contained to mainSort(), which showed a 54% increase in run cycles. Appears one of the calls to mainGtU() is inlined into mainSort in the slow version, but the drop in cycle counts on mainGtu is no where close to the increase on mainSort.
[Bug middle-end/83665] [8 regression] Big code size regression and some code quality improvement at Jan 2 2018
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83665 Pat Haugen changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #16 from Pat Haugen --- (In reply to Jan Hubicka from comment #14) > Author: hubicka > Date: Mon Feb 12 09:48:06 2018 > New Revision: 257582 > > URL: https://gcc.gnu.org/viewcvs?rev=257582=gcc=rev > Log: > > PR middle-end/83665 > * params.def (inline-min-speedup): Increase from 8 to 15. > (max-inline-insns-auto): Decrease from 40 to 30. > * ipa-split.c (consider_split): Add some buffer for function to > be considered inlining candidate. > * invoke.texi (max-inline-insns-auto, inline-min-speedup): UPdate > default values. > > Modified: > trunk/gcc/ChangeLog > trunk/gcc/doc/invoke.texi > trunk/gcc/ipa-split.c > trunk/gcc/params.def This change is responsible for a 6% degradation in CPU2000 175.vpr and a 12% degradation in CPU2006 401.bzip2. Both run on a Power7 box.
[Bug target/83497] [8 Regression] CPU2000 172.mgrid starts failing with r254730
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83497 Pat Haugen changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #5 from Pat Haugen --- I have confirmed that this is indeed just a precision difference due to a different mix and order of instructions for the computation in the RESID loop, valid reassociation with -ffast-math. The difference is then compounded as the benchmark iterates over the values. The specdiff command for mgrid specifies an absolute tolerance of "-a 1e-12" and uses the absolute difference when seeing if two values are within the specified tolerance. In this case they were not.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #10 from Pat Haugen --- (In reply to Pat Haugen from comment #9) > (pr83497, which I'm still digging on). Ignoring output miscompare and just > timing the two versions built with -fno-tree-vectorize, I see that the > performance is similar. So possibly a powerpc vector cost issue. > And then again, maybe not. Running with -fno-tree-vectorize and removing -ffast-math (which eliminates the output miscompare), I still see the degradation.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #9 from Pat Haugen --- (In reply to Martin Jambor from comment #7) > Do I understand it correctly that you suspect that the new IPA-CP > clone that is created from r256888 on is harmful? In that case, you > want to test that by trying higher values of ipa-cp-eval-threshold, > something like --param ipa-cp-eval-threshold 610 (i.e. bigger than > 606). Of course, if there are other clones with evaluations between > 500 and 610, it would affect them too. > Building with --param ipa-cp-eval-threshold=610 prevented the creation of the .resid_.constprop.1 clone and eliminated the performance degradation. Looking at the profile more in depth, I saw that most of the time in resid_.constprop was spent in the main vectorized loop. I tried both revisions with -fno-tree-vectorize to see if vectorization in the clone is the real problem on powerpc, but ran into issues with output miscompare (pr83497, which I'm still digging on). Ignoring output miscompare and just timing the two versions built with -fno-tree-vectorize, I see that the performance is similar. So possibly a powerpc vector cost issue. > You may also want to try both fast and slow revisions with > -fno-ipa-cp-clone as the first step, actually. Doing this showed r256888 about 4% slower, so not near as bad.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #5 from Pat Haugen --- Created attachment 43601 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43601=edit ipa-cp dump (r256887) (In reply to Martin Liška from comment #4) > Thank you, may I please ask you for the IPA CP dump file for not affected > revision (r256887). Do I understand the numbers right that version with > .resid_.constprop.1 is slower? Dump attached. And yes, the version with resid_.constprop.1 is slower. Also, I tried the patch from https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149#c5 and didn't see any difference in execution time.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #3 from Pat Haugen --- (In reply to Martin Liška from comment #1) > Isn't that dup of 84149? Can you please tweak --param ipa-cp-eval-threshold > to value to 200, 300, 400? Can you please attach -fdump-ipa-cp-details file? I tried the param with the 3 different values and none made any difference to execution time.
[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 --- Comment #2 from Pat Haugen --- Created attachment 43589 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43589=edit ipa-cp dump
[Bug ipa/84737] New: 20% degradation in CPU2000 172.mgrid starting with r256888
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737 Bug ID: 84737 Summary: 20% degradation in CPU2000 172.mgrid starting with r256888 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ipa Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, marxin at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu I'm seeing a 20% degradation on 172.mgrid with r256888. Benchmark was built with "-O3 -mcpu=power7 -ffast-math". Profiling shows the difference comes from function resid() and its clone. r256887 --- Counted PM_RUN_CYC events (Run_cycles.) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 658215 48.2563 mgrid_base.pat_test_64 .resid_ 367381 26.9341 mgrid_base.pat_test_64 .psinv_ 153587 11.2601 mgrid_base.pat_test_64 .interp_ 1097858.0488 mgrid_base.pat_test_64 .rprj3_ 52642 3.8594 mgrid_base.pat_test_64 .comm3_ 7912 0.5801 mgrid_base.pat_test_64 .MAIN__ 3796 0.2783 libc-2.17.so .__memset_power8 r256888 --- Counted PM_RUN_CYC events (Run_cycles.) with a unit mask of 0x00 (No unit mask) count 10 samples %image name symbol name 1109100 59.2023 mgrid_base.gcc_hunt_64 .resid_.constprop.1 368930 19.6930 mgrid_base.gcc_hunt_64 .psinv_ 1601028.5460 mgrid_base.gcc_hunt_64 .interp_ 1149546.1361 mgrid_base.gcc_hunt_64 .MAIN__ 55253 2.9493 mgrid_base.gcc_hunt_64 .comm3_ 46903 2.5036 mgrid_base.gcc_hunt_64 .resid_ 5103 0.2724 libc-2.17.so .__memset_power8
[Bug rtl-optimization/83530] [7/8 Regression] ICE in reset_sched_cycles_in_current_ebb, at sel-sched.c:7150
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83530 Pat Haugen changed: What|Removed |Added Summary|[8 Regression] ICE in |[7/8 Regression] ICE in |reset_sched_cycles_in_curre |reset_sched_cycles_in_curre |nt_ebb, at sel-sched.c:7150 |nt_ebb, at sel-sched.c:7150 --- Comment #10 from Pat Haugen --- Marking as 7 regression also as that is when the change to use -fsched-pressure --param sched-pressure-algorithm=2 as the default for PowerPC happened. But as I mentioned in Comment 7, the failure can be reproduced on prior versions by adding those two options.
[Bug rtl-optimization/83530] [8 Regression] ICE in reset_sched_cycles_in_current_ebb, at sel-sched.c:7150
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83530 --- Comment #9 from Pat Haugen --- (In reply to Andrey Belevantsev from comment #8) > I will take a look. The ICE is within the code that models the scheduling > loop in order to get the proper insn ticks and everything for later MD > processing (it is equivalent to always scheduling the next insn). Either > there is an issue in that loop that wasn't uncovered anywhere but powerpc or > there is some subtlety in the powerpc cpu model that is triggered there. It > is not very pleasant to find out and fix usually so it will take time. Thanks, appreciate that. I did find out the isssue is not very pleasant to track down as you state.
[Bug rtl-optimization/83530] [8 Regression] ICE in reset_sched_cycles_in_current_ebb, at sel-sched.c:7150
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83530 --- Comment #7 from Pat Haugen --- Assuming this is a latent selective scheduling bug since I can reproduce with r243865 by adding -fsched-pressure --param sched-pressure-algorithm=2. Looking...
[Bug other/83497] CPU2000 172.mgrid starts failing with r254730
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83497 --- Comment #4 from Pat Haugen --- (In reply to Pat Haugen from comment #0) > mgrid started failing (output miscompare) with r254730. The following > options demonstrate the failure "-O3 -mcpu=power6 -ffast-math". Incomplete option set, -m32 is also required.
[Bug other/83497] CPU2000 172.mgrid starts failing with r254730
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83497 --- Comment #3 from Pat Haugen --- (In reply to Richard Biener from comment #2) > > As far as I see the miscompare is -0.8 vs. 0.18 so it doesn't look like a > precision issue to me. Does it only happen for power6 / bigendian? > Yes, the failure is only for -mcpu=power6. I don't have a copy of CPU2000 that runs on powerpc64le, so can't say for sure if it's a big endian issue only. I will do some further digging on the failure.
[Bug other/83497] New: CPU2000 172.mgrid starts failing with r254730
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83497 Bug ID: 83497 Summary: CPU2000 172.mgrid starts failing with r254730 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org, rguenth at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64-unknown-linux-gnu Target: powerpc64-unknown-linux-gnu Build: powerpc64-unknown-linux-gnu mgrid started failing (output miscompare) with r254730. The following options demonstrate the failure "-O3 -mcpu=power6 -ffast-math". The miscompared output is... 29748: -0.839533E-12 0.182462E-12 ^ 29749: -0.839533E-12 0.182462E-12 ^ 29750: -0.849589E-12 0.184648E-12 ^ 29751: -0.849589E-12 0.184648E-12 ^ 29752: -0.852151E-12 0.185205E-12 ^ 29753: -0.852151E-12 0.185205E-12 ^ 29754: -0.852839E-12 0.185354E-12 ^ A little brief history on this since it's come and gone a couple times. All revisions deal with CFG/inlining issues. r254730 - initial failure r254937 - started working, only because this inadvertently disabled some inlining r254946 - fixed inlining from 254937, benchmark started failing again r255103 - started working So even though it's currently working on trunk I think there's an issue in r255103 which I've emailed Honza about separately. If I apply the following (which hopefully Honza will confirm is the desired behavior) to current trunk the benchmark fails again. Index: gcc/ipa-inline.c === --- gcc/ipa-inline.c(revision 255838) +++ gcc/ipa-inline.c(working copy) @@ -691,7 +691,7 @@ sreal time = compute_uninlined_call_time (e, unspec_time); sreal inlined_time = compute_inlined_call_time (e, spec_time); - if (time - inlined_time * 100 + if ((time - inlined_time) * 100 > (sreal) (time * PARAM_VALUE (PARAM_INLINE_MIN_SPEEDUP))) return true; return false;
[Bug lto/83201] [7/8 Regression] SPEC CPU2017 505.mcf_r produces incorrect output when built with -flto and FDO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 --- Comment #18 from Pat Haugen --- (In reply to Martin Liška from comment #16) > (In reply to Richard Biener from comment #15) > > SWAPINIT should end up with swaptype_long == 1 I think and swaptype_int == 1 > > for the cases in question. Enforcing swaptype_int = swaptype_long = 2 > > should make it work (scratch SWAPINIT calls). > > I can confirm that. Yes, that fixes the problem for me on PowerPC also. I can pass along the info to our SPEC rep. Richi, I'm curious if the alias violations were apparent in a dump file, or did you just happened to spot them looking through the source?
[Bug lto/83201] [7/8 Regression] SPEC CPU2017 505.mcf_r produces incorrect output when built with -flto and FDO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 --- Comment #6 from Pat Haugen --- So I did a bisect of trunk during the GCC 7 development timeframe (r235035-r247017) and it pointed to r236878 as the point where the failure started. +++ gcc/ChangeLog (revision 236878) @@ -1,3 +1,9 @@ +2016-05-30 Jan Hubicka+ + * tree-ssa-loop-ivcanon.c (try_peel_loop): Correctly set wont_exit + for peeled copies; avoid underflow when updating estimates; correctly + scale loop profile. +
[Bug lto/83201] [7/8 Regression] SPEC CPU2017 505.mcf_f produces incorrect output when built with -flto and FDO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 --- Comment #5 from Pat Haugen --- Current FSF 6 branch works fine, so I have some bisect points. Will comment further as I find out.
[Bug tree-optimization/81303] [8 Regression] 410.bwaves regression caused by r249919
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81303 --- Comment #15 from Pat Haugen --- Just confirming that the changes have eliminated the bwaves degradation on PowerPC that started with r249919.
[Bug lto/83201] SPEC CPU2017 505.mcf_f produces incorrect output when built with -flto and FDO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 --- Comment #2 from Pat Haugen --- (In reply to Pat Haugen from comment #0) > > It appears to work fine with r254943. I'll start a bisect and post results. My bisect showed that r254946 was where it started failing on trunk. And yes, it fails with current GCC 7 branch too.
[Bug lto/83201] New: SPEC CPU2017 505.mcf_f produces incorrect output when built with -flto and FDO
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201 Bug ID: 83201 Summary: SPEC CPU2017 505.mcf_f produces incorrect output when built with -flto and FDO Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: lto Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org, marxin at gcc dot gnu.org, segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu 505.mcf_f produces incorrect output when built with both LTO/FDO. Using either option separately is fine. GCC trunk r255207 was used. Following are options used. OPTIMIZE= -O3 -mcpu=power8 -flto PASS1_FLAGS = -fprofile-generate PASS1_LDFLAGS = -fprofile-generate PASS2_FLAGS = -fprofile-use PASS2_LDFLAGS = -fprofile-use Contents of inp.out.mis (miscompares). 0010: simplex iterations : 107102 simplex iterations : 107598 ^ 0014: simplex iterations : 152479 simplex iterations : 149876 ^ 0016: erased arcs: 995716 erased arcs: 995702 ^ 0017: new implicit arcs : 2995716 new implicit arcs : 2995702 ^ 0019: simplex iterations : 253145 simplex iterations : 248008 ^ 0020: objective value: 12161789395 objective value: 12171761765 ^ 0021: erased arcs: 2991635 erased arcs: 2991537 ^ 0022: new implicit arcs : 2991635 new implicit arcs : 2991537 ^ 0024: simplex iterations : 398127 simplex iterations : 385785 ^ 0025: objective value: 11729854482 objective value: 11769820561 ^ It appears to work fine with r254943. I'll start a bisect and post results.
[Bug tree-optimization/81953] Code sinking increases register pressure
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81953 --- Comment #4 from Pat Haugen --- (In reply to Richard Biener from comment #3) > The interesting part is also why RTL scheduling doesn't rectify things > here? If you're referring to -fsched-pressure, I believe the answer is that those algorithms are concerned about the case where pressure is more than available hard regs, which is not the case here.
[Bug tree-optimization/81953] New: Code sinking results in increased use of callee saved registers
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81953 Bug ID: 81953 Summary: Code sinking results in increased use of callee saved registers Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje.gcc at gmail dot com, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu void bar(); int j; void foo(int a, int b, int c, int d, int e, int f) { int l; l = a + b + c + d +e + f; if (a != 5) { bar(); j = l; } } The whole expression "l = a + b + c..." is moved past the call to bar(), which means we now need to use 6 callee-saved regs to hold the parm values across the call.
[Bug rtl-optimization/81340] ICE in compute_bb_dataflow, at var-tracking.c:6877
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81340 Pat Haugen changed: What|Removed |Added CC||mliska at suse dot cz --- Comment #1 from Pat Haugen --- Started with r249960.
[Bug rtl-optimization/81340] New: ICE in compute_bb_dataflow, at var-tracking.c:6877
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81340 Bug ID: 81340 Summary: ICE in compute_bb_dataflow, at var-tracking.c:6877 Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu $ cat table_cache.ii class a { struct b { b(int, int); } c; public: int d; a(char *) : c(0, d) {} }; class e { int f(const int &, const int &, const int &, bool, bool, bool, int, bool); }; class g { public: static g *h(); void i(a, void *); }; int e::f(const int &, const int &, const int &, bool j, bool, bool, int, bool) { g::h()->i("", ); } $ ~/install/gcc/trunk/bin/g++ -S -O2 -g -fsanitize=address table_cache.ii table_cache.ii: In member function ‘int e::f(const int&, const int&, const int&, bool, bool, bool, int, bool)’: table_cache.ii:19:19: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings] g::h()->i("", ); ^ during RTL pass: vartrack table_cache.ii:20:1: internal compiler error: in compute_bb_dataflow, at var-tracking.c:6877 } ^ 0x10f691af compute_bb_dataflow /home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:6877 0x10f696d3 vt_find_locations /home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:7118 0x10f6a3a3 variable_tracking_main_1 /home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:10332 0x10f6a3a3 variable_tracking_main() /home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:10378 0x10f6a3a3 execute /home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:10415 Please submit a full bug report, with preprocessed source if appropriate. Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions.
[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597 --- Comment #16 from Pat Haugen --- (In reply to Dmitry Babokin from comment #14) > Original test case still fails with compiler switches that I've originally > reported (-fsanitize=undefined). Is your failure fixed with r248325?
[Bug rtl-optimization/79801] Disable ira.c:add_store_equivs for some targets?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79801 Pat Haugen changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #1 from Pat Haugen --- I ran a comparison on CPU2006. The only benchmark possibly outside the noise range was 470.lbm with a 1.9% degradation.
[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597 --- Comment #12 from Pat Haugen --- (In reply to Martin Liška from comment #11) > Created attachment 41375 [details] > Patch candidate v2 > > Can you please test this version? It moves e from 10^6 to 10^5. That patch works for both the benchmarks that were affected.
[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597 --- Comment #9 from Pat Haugen --- (In reply to Martin Liška from comment #8) > > Can you please provide a test-case? Or can you dump the sreal values via > .to_double() ? That can be also hint for us to fix that properly. I'm trying to reduce the source, but it's proprietary so will see what it reduces to before I can think about posting anything. In the meantime, here's what things look like when the assert fails. (gdb) p info->self_time $7 = {m_sig = 1347786301, m_exp = -13} (gdb) p info->self_time.to_double() $8 = 164524.69494628906 (gdb) p info->time $9 = {m_sig = 1347789465, m_exp = -13} (gdb) p info->time.to_double() $10 = 164525.08117675781
[Bug libfortran/80602] Reduce stack usage for blocked matmul
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80602 --- Comment #7 from Pat Haugen --- (In reply to Thomas Koenig from comment #6) > I just committed r248074 which I suspect is the same problem > (the fix for PR 80765). > > If you could just upgrade to the most recent trunk (only > need to upgrade libgfortran, really) an check if the fix > also works for you, that would be great. Yes, both are fixed with r248704. Thanks.
[Bug libfortran/80602] Reduce stack usage for blocked matmul
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80602 Pat Haugen changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #5 from Pat Haugen --- (In reply to Thomas Koenig from comment #3) > Author: tkoenig > Date: Mon May 8 17:56:13 2017 > New Revision: 247753 This revision introduced a couple problems with SPEC on PowerPC. Both failures happen for -m32 only. 1) CPU2000 178.galgel now fails with a verification error (i.e. incorrect output). 2) CPU2006 465.tonto segfaults when running. I'll add more detail as I continue digging...
[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597 --- Comment #7 from Pat Haugen --- (In reply to Pat Haugen from comment #6) > > I just ran into the same ICE and the proposed patch fixes the problem. Unfortunately the patch introduces the same ICE on another benchmark that used to build just fine.
[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597 Pat Haugen changed: What|Removed |Added CC||pthaugen at gcc dot gnu.org --- Comment #6 from Pat Haugen --- (In reply to Martin Liška from comment #5) > Created attachment 41349 [details] > Patch candidate > > Yep, it's Honza Hubicka's PR. I'm suggesting a new function that will handle > round off errors in sreal. > > Can you please Honza take a look? Can you Dmitry test it? I just ran into the same ICE and the proposed patch fixes the problem.
[Bug tree-optimization/80705] Incorrect code generated for profile counter updates due to SLP+LIM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80705 --- Comment #1 from Pat Haugen --- I should have noted that the dumps I was looking at were slp1 and lim4.
[Bug tree-optimization/80705] New: Incorrect code generated for profile counter updates due to SLP+LIM
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80705 Bug ID: 80705 Summary: Incorrect code generated for profile counter updates due to SLP+LIM Product: gcc Version: 8.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pthaugen at gcc dot gnu.org CC: dje at gcc dot gnu.org, wschmidt at gcc dot gnu.org Target Milestone: --- Host: powerpc64le-unknown-linux-gnu Target: powerpc64le-unknown-linux-gnu Build: powerpc64le-unknown-linux-gnu Created attachment 41338 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41338=edit reduced testcase The attached testcase shows a problem where profile counter updates are incorrectly generated, which then leads to invalid profile info when the original source is rebuilt with -fprofile-use. Compile options used : -Ofast -mcpu=power8 -fprofile-generate The problem occurs on the edge counter updates for the following inner loop: while (*s && *s!='\r' && *s!='\n' && *s!='"') SLP vectorization combines adjacent counter writes on the exit paths from the loop into vector store operations. LIM then comes along and hoists the initial counter read outside the outer loop. This causes the problem because when the inner loop is entered again the edge counters are initialized to the values originally read from memory (i.e. values when the function was originally entered) NOT the updated counter values that were written to memory when exiting the inner loop. Aliasing problem?
[Bug rtl-optimization/80357] [7 Regression] ICE in model_update_limit_points_in_group, at haifa-sched.c:1982 on powerpc64le-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80357 --- Comment #7 from Pat Haugen --- (In reply to Bill Schmidt from comment #6) > That revision enabled -fsched-pressure by default, so it may have been > latent with -fsched-pressure before then. Yes, this is a latent bug in the "model" sched-pressure algorithm code. I can reproduce it with r243865 (revision before I turned on -fsched-pressure for powerpc) by adding -fsched-pressure --param sched-pressure-algorithm=2. I'll do some digging.
[Bug target/80107] ICE in final_scan_insn, at final.c:2964
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80107 Pat Haugen changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #3 from Pat Haugen --- Fixed.