[Bug tree-optimization/109491] [13 Regression] Segfault in tree-ssa-sccvn.cc:expressions_equal_p()

2023-04-12 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491

--- Comment #5 from pthaugen at gcc dot gnu.org ---
(In reply to Peter Bergner from comment #4)
> 
> Can you git bisect this to find the offending commit?

Yes, I was going to start that.

[Bug tree-optimization/109491] [13 Regression] Segfault in tree-ssa-sccvn.cc:expressions_equal_p()

2023-04-12 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491

--- Comment #1 from pthaugen at gcc dot gnu.org ---
Note this only happens on a BE system, compiles fine on LE.

[Bug tree-optimization/109491] New: Segfault in tree-ssa-sccvn.cc:expressions_equal_p()

2023-04-12 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109491

Bug ID: 109491
   Summary: Segfault in tree-ssa-sccvn.cc:expressions_equal_p()
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: bergner at gcc dot gnu.org, segher at kernel dot 
crashing.org
  Target Milestone: ---
  Host: powerpc64
Target: powerpc64
 Build: powerpc64

Created attachment 54845
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54845=edit
Reduced testcase

Hitting the following segfault on the attached testcase (sorry for size, but it
is about 1% of original size). Appears to only happen with GCC 13, compiles
fine with GCC 12.

~/install/gcc/trunk/bin/g++ -mcpu=power8 -std=c++14 -S -O2 partial.ii 
(...misc warnings...)
during GIMPLE pass: fre
partial.ii: In function ‘void gemm_complex(const DataMapper&, const
complex*, const complex*, long int, long int, long int,
complex, long int, long int, long int, long int) [with
 = complex;  =
complex;  = complex;
 = float; Packet = __vector(4) float; Packetc =
Packet2cf;  = __vector(4) float; DataMapper =
blas_data_mapper; int accRows = 4; int accCols = 4; int ConjugateLhs = 0; int
ConjugateRhs = 0; int LhsIsReal = 0; int RhsIsReal = 0]’:
partial.ii:1096:6: internal compiler error: Segmentation fault
 1096 | void gemm_complex(const DataMapper , const complex *blockAc,
  |  ^~~~
0x10f6fadb crash_signal
/home/pthaugen/src/gcc/trunk/gcc/gcc/toplev.cc:314
0x11222818 expressions_equal_p(tree_node*, tree_node*, bool)
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:6411
0x112229a7 vn_reference_op_eq
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:216
0x11222bfb vn_reference_eq(vn_reference_s const*, vn_reference_s const*)
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:858
0x11243837 vn_reference_hasher::equal(vn_reference_s const*, vn_reference_s
const*)
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:250
0x11243837 hash_table::find_slot_with_hash(vn_reference_s* const&, unsigned int,
insert_option)
/home/pthaugen/src/gcc/trunk/gcc/gcc/hash-table.h:1059
0x1122f43b vn_reference_lookup_2
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:2336
0x11101b8f walk_non_aliased_vuses(ao_ref*, tree_node*, bool, void* (*)(ao_ref*,
tree_node*, void*), void* (*)(ao_ref*, tree_node*, void*, translate_flags*),
tree_node* (*)(tree_node*), unsigned int&, void*)
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-alias.cc:3847
0x11233447 vn_reference_lookup(tree_node*, tree_node*, vn_lookup_kind,
vn_reference_s**, bool, tree_node**, tree_node*, bool)
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:3967
0x11238cc7 visit_reference_op_load
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:5683
0x11238cc7 visit_stmt
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:6187
0x1123986f process_bb
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:7918
0x1123bcdb do_rpo_vn_1
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:8518
0x1123db83 execute
/home/pthaugen/src/gcc/trunk/gcc/gcc/tree-ssa-sccvn.cc:8676
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug testsuite/99685] gcc.target/powerpc/divkc3-1.c and mulkc3-1.c fail for 32 bits

2022-10-17 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99685

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #7 from pthaugen at gcc dot gnu.org ---
Backports complete.

[Bug testsuite/99685] gcc.target/powerpc/divkc3-1.c and mulkc3-1.c fail for 32 bits

2022-05-17 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99685

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from pthaugen at gcc dot gnu.org ---
Fixed.

[Bug target/105485] New: ICE: Segmentation fault in pcrel-opt.md:get_insn_name()

2022-05-04 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105485

Bug ID: 105485
   Summary: ICE: Segmentation fault in
pcrel-opt.md:get_insn_name()
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-linux-gnu
Target: powerpc64le-linux-gnu
 Build: powerpc64le-linux-gnu

pthaugen@pike:~/temp$ cat err.c
template  void __builtin_vec_vslv();
typedef  __attribute__((altivec(vector__))) char T;
void b() {
  T c, d;
  __builtin_vec_vslv(c, d);
}


pthaugen@pike:~/temp$ ~/install/gcc/trunk/bin/g++ -mcpu=power9 -S -O2 err.c
during GIMPLE pass: lower
err.c: In function ‘void b()’:
err.c:3:6: internal compiler error: Segmentation fault
3 | void b() {
  |  ^
0x10fca4e3 crash_signal
/home/pthaugen/src/gcc/trunk/gcc/gcc/toplev.cc:322
0x11cd9a10 get_insn_name(int)
/home/pthaugen/src/gcc/trunk/gcc/gcc/config/rs6000/pcrel-opt.md:134798
0x11635f87 rs6000_gimple_fold_builtin(gimple_stmt_iterator*)
   
/home/pthaugen/src/gcc/trunk/gcc/gcc/config/rs6000/rs6000-builtin.cc:1301
0x10b3c017 gimple_fold_call
/home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-fold.cc:5559
0x10b3e00b fold_stmt_1
/home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-fold.cc:6298
0x11e5ce17 lower_stmt
/home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:390
0x11e5ce17 lower_sequence
/home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:217
0x11e5bf03 lower_gimple_bind
/home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:475
0x11e5dc33 lower_function_body
/home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:110
0x11e5dc33 execute
/home/pthaugen/src/gcc/trunk/gcc/gcc/gimple-low.cc:195
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


pthaugen@pike:~/temp$ ~/install/gcc/trunk/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/home/pthaugen/install/gcc/trunk/bin/g++
COLLECT_LTO_WRAPPER=/home/pthaugen/install/gcc/trunk/libexec/gcc/powerpc64le-unknown-linux-gnu/13.0.0/lto-wrapper
Target: powerpc64le-unknown-linux-gnu
Configured with: /home/pthaugen/src/gcc/trunk/gcc/configure
--prefix=/home/pthaugen/install/gcc/trunk --enable-decimal-float --enable-lto
--with-as=/usr/bin/as --with-ld=/usr/bin/ld --enable-languages=c,fortran,c++
--disable-multilib --disable-libsanitizer --with-cpu=power8 --disable-bootstrap
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 13.0.0 20220504 (experimental) [master r13-118-g4a206161072] (GCC)

[Bug testsuite/100407] New test cases attr-retain-*.c fail after their introduction in r11-7284

2022-02-24 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100407

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org
 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #15 from pthaugen at gcc dot gnu.org ---
Fixed.

[Bug rtl-optimization/68212] Loop unroller breaks basic block frequencies

2022-02-02 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212

--- Comment #9 from pthaugen at gcc dot gnu.org ---
The problem can be seen in the loop2_unroll dump:

pthaugen@pike:~/temp/pr68212$ grep "Invalid sum of" simple.c.272r.loop2_unroll 
;; Invalid sum of incoming counts 285685646 (estimated locally), should be
212627725 (estimated locally)
;; Invalid sum of incoming counts 32061393 (estimated locally), should be
105119324 (estimated locally)

[Bug rtl-optimization/68212] Loop unroller breaks basic block frequencies

2022-02-02 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||guojiufu at gcc dot gnu.org,
   ||pthaugen at gcc dot gnu.org

--- Comment #8 from pthaugen at gcc dot gnu.org ---
(In reply to Peter Bergner from comment #7)
> (In reply to Pat Haugen from comment #4)
> > Author: pthaugen
> > Date: Fri Oct 14 17:10:18 2016
> > New Revision: 241170
> > 
> > URL: https://gcc.gnu.org/viewcvs?rev=241170=gcc=rev
> > Log:
> > PR rtl-optimization/68212
> > * cfgloopmanip.c (duplicate_loop_to_header_edge): Use preheader edge
> > frequency when computing scale factor for peeled copies.
> > * loop-unroll.c (unroll_loop_runtime_iterations): Fix freq/count
> > values for switch/peel blocks/edges.
> 
> Repeating Martin's question.  Pat, is this PR fixed with your patch or is
> there more to do?

No, there are still problems. The patch noted fixed the count/probability for
the peeled switch/case blocks created before entering the unrolled loop. But
the counts for the loop header/exit blocks are still incorrect. The last
activity I know of concerning that problem was the patch by Jiufu Guo here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-February/539594.html. Not sure
if he has any more input here.

[Bug target/65010] ppc backend generates unnecessary signed extension

2022-01-20 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65010

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #11 from pthaugen at gcc dot gnu.org ---
Another example to clean up. The back to back constant load/sign extend
sequence of rtl insns is created in each block by the block reordering pass
(.bbo) duplicating the common return block.

int foo(int in)
{
   if (in == 1)
 return 123;
   return 0;
}


foo:
.LFB0:
.cfi_startproc
cmpwi 0,3,1
beq 0,.L5
li 3,0
extsw 3,3
blr
.p2align 4,,15
.L5:
li 3,123
extsw 3,3
blr

[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-04 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #34 from pthaugen at gcc dot gnu.org ---
(In reply to pthaugen from comment #33)
> 
> I tried the patch on a Power9 system. Execution time went from 371 seconds
> to 291.

Which I should have included is in line, or even slightly better, than the 2
patches posted by Tamar.

[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2022-01-04 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #33 from pthaugen at gcc dot gnu.org ---
(In reply to rsand...@gcc.gnu.org from comment #32)
> Created attachment 52102 [details]
> Alternative patch
> 
> This patch is a squash of several ira tweaks that together recover the
> pre-GCC11 exchange2 performance on aarch64.  It isn't ready for trunk
> yet (hence lack of comments and changelog).  It would be great to hear
> whether/how it works on other targets though.

I tried the patch on a Power9 system. Execution time went from 371 seconds to
291.

[Bug ipa/103734] IPA-CP opportunity for imagick in SPECCPU 2017

2021-12-17 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103734

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #3 from pthaugen at gcc dot gnu.org ---
(In reply to Hongtao.liu from comment #2)
> (In reply to Tamar Christina from comment #0)
> > When using --param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=20 on
> > imagick the hot functions MorphologyApply and GetVirtualPixelsFromNexus get
> > replaced by specialized versions that are much smaller and faster.
> > 
> > Some other benchmarks like leela also get very small uplifts but the imagick
> > one is worth 14%.  Both flags seem to be needed.
> 
> Observe similar thing on ICX with -param=inline-min-speedup=3

I tested on a Power9 system and see the following improvements:

--param ipa-cp-eval-threshold=1 --param ipa-cp-unit-growth=20 : +5%

-param=inline-min-speedup=3 : +30%

[Bug target/103743] PPC: Inefficient equality compare for large 64-bit constants having only 16-bit relevant bits in high part

2021-12-16 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103743

--- Comment #2 from pthaugen at gcc dot gnu.org ---
(In reply to Peter Bergner from comment #1)
> Pat, does the patch from Alan you're working to get committed help with this
> test case?

No, it just loads the constant slightly different:

li 9,1
rotldi 9,9,63
cmpd 0,3,9

[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-12-09 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #21 from pthaugen at gcc dot gnu.org ---
(In reply to Jan Hubicka from comment #20)
> With g:r12-5872-gf157c5362b4844f7676cae2aba81a4cf75bd68d5 we should no
> longer need -fno-inline-functions-called-once

Yes, I see that now with an updated trunk.

[Bug rtl-optimization/98782] [11/12 Regression] Bad interaction between IPA frequences and IRA resulting in spills due to changes in BB frequencies

2021-12-09 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98782

--- Comment #19 from pthaugen at gcc dot gnu.org ---
I tried -fno-inline-functions-called-once and the patches on a Power9 system.
Following are the run times and spill counts (grep -c Spilling
exchange2.fppized.f90.298r.ira). Interesting that the spill counts increase
when -fno-inline-functions-called-once is added, but obviously that additional
spill is on cold paths since execution time improves. Compiler options used are
"-O3 -mcpu=power9 -fpeel-loops -funroll-loops -ffast-math".

  time(sec)  Spill
base473   3284

no-inline-functions-called-once 370   5410

patches 1 & 2   397461

patches 1 & 2
+ no-inline-functions-called-once   299870

[Bug target/102783] [powerpc] FPSCR manipulations cannot be relied upon

2021-10-15 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #2 from pthaugen at gcc dot gnu.org ---
I’ll note that an inline asm stmt appears to be a barrier for the scheduler,
but apparently not for other parts of the compiler. For example on the
following code:

double d;
void foo(double *dp, double c)
{
  double e;

  e = c + d;
  asm volatile ("");
  *dp = e + d;
  return;
} 

The scheduling dumps show that the asm volatile has dependencies on all insns
before and after it. But that doesn’t really help because the first addition
stmt gets moved past the asm volatile at expand time.

[Bug ipa/96825] [11 Regression] Commit r11-2645 degrades CPU2017 548.exchange2_r by 35%

2021-04-09 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825

--- Comment #6 from pthaugen at gcc dot gnu.org ---
(In reply to Richard Biener from comment #4)
> I believe there have been improvements recently - can you re-assess the
> magnitude of the problem?  The corresponding ARM PR got re-targeted to GCC
> 12 (for a RA fix), I think Martin has improved the IPA CP parts, maybe not
> fully though.

There has been no improvement seen on Power since the degradation appeared.

[Bug target/99133] Power10 xxspltiw, xxspltidp, xxsplti32dx instructions need to be marked as prefixed

2021-03-31 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99133

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from pthaugen at gcc dot gnu.org ---
Fixed.

[Bug target/99133] Power10 xxspltiw, xxspltidp, xxsplti32dx instructions need to be marked as prefixed

2021-02-18 Thread pthaugen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99133

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #2 from pthaugen at gcc dot gnu.org ---
I submitted a prefix cleanup patch back in Dec. that also took care of this
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561591.html. It's still
waiting review.

[Bug other/96825] New: Commit r11-2645 degrades CPU2017 548.exchange2_r by 35%

2020-08-27 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96825

Bug ID: 96825
   Summary: Commit r11-2645 degrades CPU2017 548.exchange2_r by
35%
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: bergner at gcc dot gnu.org, hubicka at gcc dot gnu.org,
segher at gcc dot gnu.org, seurer at gcc dot gnu.org,
wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64-unknown-linux-gnu
Target: powerpc64-unknown-linux-gnu
 Build: powerpc64-unknown-linux-gnu

The given commit (1118a3ff9d3ad6a64bba25dc01e7703325e23d92) causes a 35%
degradation for exchange2_r on Power9 built with the options "-O2
-mcpu=power9". Switching to -O3 results in a 44% degradation. The degradation
occurs in __brute_force_MOD_digits_2().

[Bug tree-optimization/50439] gfortran infinite loop with -floop-interchange

2020-06-04 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50439

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #12 from pthaugen at gcc dot gnu.org ---
I can no longer produce the condition either, with the reduced testcase or
416.gamess. So if you think the correct thing to do is close this bug I'm fine
with that.

[Bug lto/92600] New: ICE: lto1: internal compiler error: symtab_node::verify failed, building 523.xalancbmk_r with -flto -fno-inline

2019-11-20 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92600

Bug ID: 92600
   Summary: ICE: lto1: internal compiler error:
symtab_node::verify failed, building 523.xalancbmk_r
with -flto -fno-inline
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, marxin at gcc dot gnu.org,
segher at kernel dot crashing.org, wschmidt at gcc dot 
gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

I'm seeing the following ICE when building CPU2017 523.xalancbmk_r with the
options "-O2 -mcpu=power8 -flto -fno-inline". The errors are emitted during the
link step.


AttributeListImpl.cpp:246:8: warning: type 'struct NameCompareFunctor' violates
the C++ One Definition Rule [-Wodr]
  246 | struct NameCompareFunctor
  |^
AttributesImpl.cpp:266:8: note: a different type is defined in another
translation unit
  266 | struct NameCompareFunctor
  |^
AttributeListImpl.cpp:261:21: note: the first difference of corresponding
definitions is field 'm_name'
  261 |  const XMLCh* const m_name;
  | ^
AttributesImpl.cpp:281:21: note: a field with different name is defined in
another translation unit
  281 |  const XMLCh* const m_qname;
  | ^
AttributeListImpl.cpp:246:8: note: type 'struct NameCompareFunctor' itself
violates the C++ One Definition Rule
  246 | struct NameCompareFunctor
  |^
AttributesImpl.cpp:266:8: note: the incompatible type is defined here
  266 | struct NameCompareFunctor
  |^
lto1: error: Two symbols with same comdat_group are not linked by the
same_comdat_group list.
_ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv/871705 (resetEntities)
@0x71d34b022ec0
  Type: function definition analyzed
  Visibility: externally_visible undef public weak comdat
comdat_group:_ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv one_only
section:.text._ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv
(implicit_section) virtual
  Address is taken.
  References: 
  Referring: *.LTHUNK8/459137
(alias)_ZTVN11xercesc_2_715XercesDOMParserE/1291254
(addr)_ZTVN11xercesc_2_712XSDDOMParserE/872502 (addr)
  Read from file: XSDDOMParser.o
  Function flags: count:1073741824 (estimated locally) merged_comdat
  Called by: 
  Calls: 
_ZThn16_N11xercesc_2_715XercesDOMParser13resetEntitiesEv/459138
(_ZThn16_N11xercesc_2_715XercesDOMParser13resetEntitiesEv) @0x71d349e9ef40
  Type: function definition analyzed
  Visibility: externally_visible prevailing_def_ironly public weak comdat
comdat_group:_ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv one_only
section:.text._ZN11xercesc_2_715XercesDOMParser13resetEntitiesEv
(implicit_section) virtual artificial
  Same comdat group as: *.LTHUNK8/459137
  Address is taken.
  References: 
  Referring: _ZTVN11xercesc_2_712XSDDOMParserE/872502
(addr)_ZTVN11xercesc_2_715XercesDOMParserE/1291254 (addr)
  Read from file: IGXMLScanner2.o
  Function flags: calls_comdat_local merged_comdat indirect_call_target
  Thunk fixed offset -16 virtual value 0 indirect_offset 0 has virtual offset 0
  Called by: 
  Calls: *.LTHUNK8/459137 (can throw external) 
during IPA pass: pure-const
lto1: internal compiler error: symtab_node::verify failed
0x102ab7df symtab_node::verify_symtab_nodes()
/home/pthaugen/src/gcc/trunk/gcc/gcc/symtab.c:1310
0x10654a23 symtab_node::checking_verify_symtab_nodes()
/home/pthaugen/src/gcc/trunk/gcc/gcc/cgraph.h:648
0x10654a23 symbol_table::remove_unreachable_nodes(_IO_FILE*)
/home/pthaugen/src/gcc/trunk/gcc/gcc/ipa.c:667
0x101f26f7 read_cgraph_and_symbols(unsigned int, char const**)
/home/pthaugen/src/gcc/trunk/gcc/gcc/lto/lto-common.c:2910
0x101c3bcb lto_main()
/home/pthaugen/src/gcc/trunk/gcc/gcc/lto/lto.c:629
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
lto-wrapper: fatal error: /home/pthaugen/install/gcc/trunk/bin/g++ returned 1
exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status


I thought this might be a possible dup of PR91241 or PR89605 but was unable to
reproduce with the GCC 8 or 9 compilers. The build succeeds with trunk if I
change the optimization level to -O1 or remove -fno-inline.

[Bug rtl-optimization/90813] [10 regression] gfortran.dg/proc_ptr_51.f90 fails (SIGSEGV) after 272084

2019-06-25 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90813

pthaugen at gcc dot gnu.org changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #20 from pthaugen at gcc dot gnu.org ---
(In reply to Segher Boessenkool from comment #17)
> sched2 swaps the two insns (37 and 40 for me -- use -dp to see the numbers
> in your .s file, use -da if you want lots of dumps, -dap together).
> 
> So why did sched2 decide it can swap these?  They are in the same aliasing
> set, so it shouldn't do this.  Hrm.

I'm looking into this...

[Bug target/84369] test case gcc.dg/sms-10.c fails on power9

2019-04-19 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369

--- Comment #7 from pthaugen at gcc dot gnu.org ---
Author: pthaugen
Date: Fri Apr 19 17:14:57 2019
New Revision: 270461

URL: https://gcc.gnu.org/viewcvs?rev=270461=gcc=rev
Log:
Backport from mainline:
2019-04-16  Pat Haugen  

PR target/84369
* config/rs6000/power9.md: Add store forwarding bypass.


Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/rs6000/power9.md

[Bug target/84369] test case gcc.dg/sms-10.c fails on power9

2019-04-16 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84369

--- Comment #5 from pthaugen at gcc dot gnu.org ---
Author: pthaugen
Date: Tue Apr 16 15:58:02 2019
New Revision: 270394

URL: https://gcc.gnu.org/viewcvs?rev=270394=gcc=rev
Log:
PR target/84369
* config/rs6000/power9.md: Add store forwarding bypass.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/power9.md

[Bug rtl-optimization/89154] 5% degradation of CPU2006 473.astar starting with r266305

2019-02-05 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89154

--- Comment #3 from Pat Haugen  ---
(In reply to Segher Boessenkool from comment #1)
> The new version needs to save r4 because it reuses the reg for storing r7+r8.
> And we still don't wrap CR separately, sigh.

Yes, and similar for r3 since it's reused in the block. Another thing that
could be moved is the r1 adjustment, is that also a component that isn't
handled separately?

[Bug tree-optimization/89154] New: 5% degradation of CPU2006 473.astar starting with r266305

2019-02-01 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89154

Bug ID: 89154
   Summary: 5% degradation of CPU2006 473.astar starting with
r266305
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, rguenth at gcc dot gnu.org,
segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

Not sure if this is really tree-optimization issue, just picked as initial
component since fix dealt with that. Could possibly be
rtl-optimization/shrink-wrap issue brought about by additional register
pressure due to CSE'ing/hoisting some additional code.

Funtion way2obj::releasepoint() degrades 20% starting with r266305. Looking at
perf output, the main difference seems to be that we're no longer
shrink-wrapping the early exit test at the start of the function.

Following is the annotated assembly of the start of the function.

r266304:

10006a40 <_ZN7way2obj12releasepointEii>: /* way2obj::releasepoint(int,
int) total: 2032811 22.9279 */
   :10006a40:   lis r2,4098
   :10006a44:   addir2,r2,32512
 95384  1.0758 :10006a48:   lwz r9,4424(r3)
   :10006a4c:   ld  r8,8(r3)
119001  1.3422 :10006a50:   lhz r7,16(r3)
 1 1.1e-05 :10006a54:   mullw   r9,r9,r5
   :10006a58:   add r9,r9,r4
   :10006a5c:   extsw   r9,r9
169526  1.9121 :10006a60:   rldicr  r9,r9,2,61
   :10006a64:   lhzxr10,r8,r9
 21865  0.2466 :10006a68:   cmpwr10,r7
   :10006a6c:   beqlr



r266305:

10006a40 <_ZN7way2obj12releasepointEii>: /* way2obj::releasepoint(int,
int) total: 2440798 26.2354 */
   :10006a40:   lis r2,4098
   :10006a44:   addir2,r2,32512
 35498  0.3816 :10006a48:   lwa r6,4424(r3)
   :10006a4c:   ld  r7,8(r3)
 26361  0.2833 :10006a50:   std r30,-16(r1)
   :10006a54:   mr  r30,r3
157660  1.6946 :10006a58:   mfcrr12
162000  1.7413 :10006a5c:   lhz r3,16(r3)
17 1.8e-04 :10006a60:   std r23,-72(r1)
   139  0.0015 :10006a64:   mr  r23,r4
 2 2.1e-05 :10006a68:   mullw   r9,r6,r5
59 6.3e-04 :10006a6c:   stw r12,8(r1)
244832  2.6316 :10006a70:   stdur1,-112(r1)
 4 4.3e-05 :10006a74:   add r9,r9,r4
 5 5.4e-05 :10006a78:   extsw   r9,r9
   201  0.0022 :10006a7c:   rldicr  r8,r9,2,61
   343  0.0037 :10006a80:   add r4,r7,r8
 9 9.7e-05 :10006a84:   lhzxr10,r7,r8
151595  1.6294 :10006a88:   cmpwr10,r3
   :10006a8c:   beq 10006c64
<_ZN7way2obj12releasepointEii+0x224>

The target of the conditional branch in the slow version is just the epilogue
code to restore R1, R23, R30 and CR3/CR4 and return.

[Bug ipa/85103] [8/9 Regression] Performance regressions on SPEC with r257582

2019-01-17 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #21 from Pat Haugen  ---
> Knowing what inline decision matters for VPR, I can try to fix it too.

Gathering some perf data, the hot functions for various revisions are as
follows. All other functions report < 0.5% of execution time.

r257581
---
samples  %image name   symbol name
577871   57.8700  vpr_base.temp_32 try_route
402207   40.2784  vpr_base.temp_32 get_heap_head



r257582
---
samples  %image name   symbol name
428249   40.9911  vpr_base.pat_test_32 try_route
402768   38.5521  vpr_base.pat_test_32 get_heap_head
189358   18.1249  vpr_base.pat_test_32 node_to_heap.part.0



r267727 (after patches that fixed bzip2 went in)
---
samples  %image name   symbol name
493998   45.9797  vpr_base.pat_base_32 try_route
416389   38.7561  vpr_base.pat_base_32 get_heap_head
140727   13.0984  vpr_base.pat_base_32 add_to_heap

So from the above we can see that r257582 stopped inlining node_to_heap() into
try_route(). In r267727, node_to_heap() is again being inlined into
try_route(), but add_to_heap() is no longer inlined into node_to_heap(), which
is the only caller of add_to_heap(). So it appears the needed inlining is
getting the chain node_to_heap()->add_to_heap() to both get inlined into
try_route again.

[Bug ipa/85103] [8/9 Regression] Performance regressions on SPEC with r257582

2019-01-08 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #19 from Pat Haugen  ---
(In reply to Jan Hubicka from comment #18)
> which makes it to be inlined. Does it solve the perofmrance problem for both
> benchmarks?

Looking at our nightly spec runs, the bzip2 degradation has indeed been cleaned
up. But it looks like 175.vpr degraded another 2% or so over the last couple
days.

[Bug ipa/85103] [8/9 Regression] Performance regressions on SPEC with r257582

2018-12-07 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #16 from Pat Haugen  ---
> 
> Do you observe the same slowdown if you restore either of the params to
> the value before the r257582 change?
> 

--param max-inline-insns-auto=40 results in the same degradation.

--param inline-min-speedup=8 results in even more degratation (an additional
12% over r257582).

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-10-09 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

--- Comment #7 from Pat Haugen  ---
I also see the loop now being aligned when I apply your patch.

srdi 10,10,2
mtctr 10
.p2align 4,,15
.L6:
ld 9,0(11)
ld 8,0(4)

[Bug gcov-profile/77698] Unrolled loop not considered hot after profiling

2018-10-05 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77698

Pat Haugen  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #5 from Pat Haugen  ---
It's still not fixed in current trunk. After unrolling maybe_hot_bb_p() returns
false (via maybe_hot_count_p()), which prevents aligning the label in
final.c:compute_alignments(). Here's the tail section of debug session and
partial backtrace to show.

maybe_hot_count_p (fun=0x759f, count=...) at
/home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:185
185   return (count.to_gcov_type () >= get_hot_bb_threshold ());
(gdb) p count.to_gcov_type ()
$3 = 25
(gdb) p get_hot_bb_threshold ()
$4 = 100
(gdb) bt
#0  maybe_hot_count_p (fun=0x759f, count=...) at
/home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:185
#1  0x10d8fdb0 in maybe_hot_bb_p (fun=0x759f,
bb=0x759801a0)
at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:195
#2  0x10d9045c in optimize_bb_for_size_p (bb=0x759801a0)
at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/predict.c:301
#3  0x108c7234 in compute_alignments () at
/home/pthaugen/src/gcc/trunk_work/gcc/gcc/final.c:674
#4  0x108c7d3c in (anonymous
namespace)::pass_compute_alignments::execute (this=0x12886200)
at /home/pthaugen/src/gcc/trunk_work/gcc/gcc/final.c:823

[Bug rtl-optimization/86892] New: RTL CSE commoning trivial constants across call and/or too early

2018-08-08 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86892

Bug ID: 86892
   Summary: RTL CSE  commoning trivial constants across call
and/or too early
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, jakub at gcc dot gnu.org,
rsandifo at gcc dot gnu.org, segher at gcc dot gnu.org,
wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

For the following testcase cse.c will common the constant 0 across the call
which then requires the use of a non-volatile register (and prologue/epilogue
save/restore).

void bar();
int a, b;
void foo()
{
  a = 0;
  bar();
  b = 0;
}


I have also observed a situation where early cse of a constant prevented some
combine transformations from occurring because the register's lifetime had been
extended.

The feeling is that cse of trivial constants should not be done so early in the
pass schedule and should not be done across calls at all.

[Bug target/86612] __strdup problem on power 9

2018-07-26 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86612

Pat Haugen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||pthaugen at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #3 from Pat Haugen  ---
Was really a library difference, with newer glibc no longer declaring __strdup.

Fixed.

[Bug target/86612] __strdup problem on power 9

2018-07-26 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86612

--- Comment #2 from Pat Haugen  ---
Author: pthaugen
Date: Thu Jul 26 20:47:37 2018
New Revision: 263021

URL: https://gcc.gnu.org/viewcvs?rev=263021=gcc=rev
Log:
PR target/86612
* gcc.target/powerpc/pr58673-2.c: Call strdup.


Modified:
branches/gcc-8-branch/gcc/testsuite/ChangeLog
branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/pr58673-2.c

[Bug target/86612] __strdup problem on power 9

2018-07-26 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86612

--- Comment #1 from Pat Haugen  ---
Author: pthaugen
Date: Thu Jul 26 20:41:25 2018
New Revision: 263020

URL: https://gcc.gnu.org/viewcvs?rev=263020=gcc=rev
Log:
PR target/86612
* gcc.target/powerpc/pr58673-2.c: Call strdup.


Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.target/powerpc/pr58673-2.c

[Bug tree-optimization/86489] ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO

2018-07-13 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489

Pat Haugen  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Pat Haugen  ---
Fixed, thanks.

[Bug tree-optimization/86489] ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO

2018-07-12 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489

--- Comment #5 from Pat Haugen  ---
(In reply to kugan from comment #3)
> index f6fa2f7..fbdf838 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -2555,6 +2555,7 @@ number_of_iterations_popcount (loop_p loop, edge exit,
> ... = PHI .  */
>gimple *phi = SSA_NAME_DEF_STMT (b_11);
>if (gimple_code (phi) != GIMPLE_PHI
> +  || (gimple_bb (phi) != loop_latch_edge (loop)->dest)
>|| (gimple_assign_lhs (and_stmt)
>   != gimple_phi_arg_def (phi, loop_latch_edge (loop)->dest_idx)))
>  return false;

This fixes the problem for me.

[Bug tree-optimization/86489] New: ICE in gimple_phi_arg starting with r261682 when building 531.deepsjeng_r with FDO + LTO

2018-07-11 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86489

Bug ID: 86489
   Summary: ICE in gimple_phi_arg starting with r261682 when
building 531.deepsjeng_r with FDO + LTO
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, kugan at gcc dot gnu.org,
segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

The patch for pr82479 causes an ICE while building CPU2017 531.deepsjeng_r with
FDO and LTO. The ICE occurs during the link step of the -fprofile-use build.

/home/pthaugen/install/gcc/gcc_hunt/bin/g++  -m64 -O3 -mcpu=power9
-fpeel-loops -funroll-loops -ffast-math -mpopcntd -mrecip -flto -DSPEC_LP64
 -m64 -Wl,-q  -Wl,-rpath=/home/pthaugen/install/gcc/gcc_hunt/lib64   attacks.o
bitboard.o bits.o board.o draw.o endgame.o epd.o generate.o initp.o make.o
moves.o neval.o pawn.o preproc.o search.o see.o sjeng.o state.o ttable.o
utils.o  -o deepsjeng_r  
during GIMPLE pass: cunroll
generate.cpp: In function 'gen.constprop':
generate.cpp:159:5: internal compiler error: in gimple_phi_arg, at
gimple.h:4345
 int gen(state_t *s, move_s *moves) {
 ^
0x1013c597 gimple_phi_arg
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/gimple.h:4345
0x1013c5f3 gimple_phi_arg
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/gimple.h:4345
0x1013c5f3 gimple_phi_arg
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/gimple.h:4353
0x10a37607 gimple_phi_arg_def
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/gimple.h:4396
0x10a37607 number_of_iterations_popcount
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:2559
0x10a37607 number_of_iterations_exit_assumptions(loop*, edge_def*,
tree_niter_desc*, gcond**, bool)
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:2364
0x10a392eb number_of_iterations_exit_assumptions(loop*, edge_def*,
tree_niter_desc*, gcond**, bool)
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:2611
0x10a392eb number_of_iterations_exit(loop*, edge_def*, tree_niter_desc*, bool,
bool)
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:2616
0x10a3985f number_of_iterations_exit(loop*, edge_def*, tree_niter_desc*, bool,
bool)
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/vec.h:884
0x10a3985f estimate_numbers_of_iterations(loop*)
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:4100
0x10a3ce73 estimate_numbers_of_iterations(function*)
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-niter.c:4329
0x10a07ec7 tree_unroll_loops_completely
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-ivcanon.c:1452
0x10a08603 execute
/home/pthaugen/src/gcc/gcc_hunt/gcc/gcc/tree-ssa-loop-ivcanon.c:1612
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-21 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #13 from Pat Haugen  ---
Author: pthaugen
Date: Mon May 21 16:41:09 2018
New Revision: 260477

URL: https://gcc.gnu.org/viewcvs?rev=260477=gcc=rev
Log:
PR target/85698
* gcc.target/powerpc/vec-setup-be-long.c: Remove XFAIL.


Modified:
branches/gcc-8-branch/gcc/testsuite/ChangeLog
branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/vec-setup-be-long.c

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-21 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #12 from Pat Haugen  ---
Author: pthaugen
Date: Mon May 21 16:34:44 2018
New Revision: 260476

URL: https://gcc.gnu.org/viewcvs?rev=260476=gcc=rev
Log:
PR target/85698
* config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest
operand.

* gcc.target/powerpc/pr85698.c: New test.


Added:
branches/gcc-7-branch/gcc/testsuite/gcc.target/powerpc/pr85698.c
Modified:
branches/gcc-7-branch/gcc/ChangeLog
branches/gcc-7-branch/gcc/config/rs6000/rs6000.c
branches/gcc-7-branch/gcc/testsuite/ChangeLog

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-21 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #11 from Pat Haugen  ---
Author: pthaugen
Date: Mon May 21 16:23:20 2018
New Revision: 260475

URL: https://gcc.gnu.org/viewcvs?rev=260475=gcc=rev
Log:
PR target/85698
* config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest
operand.

* gcc.target/powerpc/pr85698.c: New test.


Added:
branches/gcc-8-branch/gcc/testsuite/gcc.target/powerpc/pr85698.c
Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/rs6000/rs6000.c
branches/gcc-8-branch/gcc/testsuite/ChangeLog

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-17 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

Pat Haugen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Pat Haugen  ---
Fixed.

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-17 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #9 from Pat Haugen  ---
Author: pthaugen
Date: Thu May 17 16:19:16 2018
New Revision: 260329

URL: https://gcc.gnu.org/viewcvs?rev=260329=gcc=rev
Log:
PR target/85698
* config/rs6000/rs6000.c (rs6000_output_move_128bit): Check dest
operand.

* gcc.target/powerpc/pr85698.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/powerpc/pr85698.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/rs6000/rs6000.c
trunk/gcc/testsuite/ChangeLog

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-14 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #7 from Pat Haugen  ---
So the problem is that we're generating a stxvw4x insn to write to memory,
which corrupts the contents due to both endian behavior and element size (since
we're dealing with halfword/uint16_t elements).

Value in vector reg = 0x0002fffc0002fff5000e

stvx/good
(gdb) x/8hx $r1+$r8
0x7fffe490: 0x000e  0xfff5  0x0002  0x  0xfffc  0x0002  0x  0x


stxvw4x/bad
(gdb) x/8hx $r7+$r8
0x7fffe470: 0x  0x  0xfffc  0x0002  0x0002  0x  0x000e  0xfff5

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-14 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #6 from Pat Haugen  ---
(In reply to Richard Biener from comment #4)
> I can see what the patch does to this testcase on x86_64 - it enables BB
> vectorization of the first two loops after runrolling.  I don't see anything
> suspicious here on x86_64 and 525.x264_r works fine for me.
> 
> Can you claify whether test, ref or train inputs fail for you?  I tried
> AVX256, AVX128 and plain old SSE sofar without any issue but ref takes some
> time...
> 
> Can you check whether the following reduced file produces the same assembly
> for add4x4_idct as in the complete benchmark?  If so it should be possible to
> generate a runtime testcase from it.  Please attach preprocessed source if
> that doesn't work out.
> 
> Sofar I do suspect we are hitting a latent target issue?
> 
> #include 
> static uint8_t x264_clip_uint8( int x )
> {
>   return x&(~255) ? (-x)>>31 : x;
> }
> void add4x4_idct( uint8_t *p_dst, int16_t dct[16])
> {
>   int16_t d[16];
>   int16_t tmp[16];
>   for( int i = 0; i < 4; i++ )
> {
>   int s02 =  dct[0*4+i] +  dct[2*4+i];
>   int d02 =  dct[0*4+i] -  dct[2*4+i];
>   int s13 =  dct[1*4+i] + (dct[3*4+i]>>1);
>   int d13 = (dct[1*4+i]>>1) -  dct[3*4+i];
>   tmp[i*4+0] = s02 + s13;
>   tmp[i*4+1] = d02 + d13;
>   tmp[i*4+2] = d02 - d13;
>   tmp[i*4+3] = s02 - s13;
> }
>   for( int i = 0; i < 4; i++ )
> {
>   int s02 =  tmp[0*4+i] +  tmp[2*4+i];
>   int d02 =  tmp[0*4+i] -  tmp[2*4+i];
>   int s13 =  tmp[1*4+i] + (tmp[3*4+i]>>1);
>   int d13 = (tmp[1*4+i]>>1) -  tmp[3*4+i];
>   d[0*4+i] = ( s02 + s13 + 32 ) >> 6;
>   d[1*4+i] = ( d02 + d13 + 32 ) >> 6;
>   d[2*4+i] = ( d02 - d13 + 32 ) >> 6;
>   d[3*4+i] = ( s02 - s13 + 32 ) >> 6;
> }
>   for( int y = 0; y < 4; y++ )
> {
>   for( int x = 0; x < 4; x++ )
> p_dst[x] = x264_clip_uint8( p_dst[x] + d[y*4+x] );
>   p_dst += 32;
> }
> }

Yes, that produces similar code, and adding the following to it produces an
executable test that fails at -O3.

void main()
{
  uint8_t dst[128];
  int16_t dct[16];
  int i;

  for (i = 0; i < 16; i++)
dct[i] = i*10 + i;
  for (i = 0; i < 128; i++)
dst[i] = i;

  add4x4_idct(dst, dct);

  if (dst[0] != 14 || dst[1] != 0 || dst[2] != 4 || dst[3] != 2 
  || dst[32] != 28 || dst[33] != 35 || dst[34] != 33 || dst[35] != 35)
abort();

}

Continuing to debug further...

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-11 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #5 from Pat Haugen  ---
(In reply to Richard Biener from comment #4)
> 
> Can you claify whether test, ref or train inputs fail for you?  I tried
> AVX256, AVX128 and plain old SSE sofar without any issue but ref takes some
> time...
> 

I see the error for ref and test inputs. The train input appears to pass, but
then  the FDO optimized version fails with the ref input also.

I will keep looking at the other stuff you requested.

[Bug tree-optimization/85698] [8/9 Regression] CPU2017 525.x264_r fails starting with r257581

2018-05-09 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #3 from Pat Haugen  ---
(In reply to Richard Biener from comment #2)
> 
> Can you help me with isolating this to a single function inside that file?
> Maybe try sticking __attribute__((optimize("no-tree-vectorize"))) on some
> functions.  Oh, there's also the vect_loop debug counter
> (-fdbg-cnt=vect_loop:N).

add4x4_idct() looks like the function, adding the attribute (or
"no-tree-slp-vectorize") to it resulted in a successful run.


> Otherwise I'll have to find a power8 machine where I can set up CPU 2017
> myself (unlikely this week due to public holidays).

Note that it also fails with -mcpu=power7, so a power8 machine is not needed.

[Bug tree-optimization/85698] CPU2017 525.x264_r fails starting with r257581

2018-05-08 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

--- Comment #1 from Pat Haugen  ---
Looks like benchmark fails when x264_src/common/dct.c is compiled with r257581.

[Bug tree-optimization/85698] New: CPU2017 525.x264_r fails starting with r257581

2018-05-08 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85698

Bug ID: 85698
   Summary: CPU2017 525.x264_r fails starting with r257581
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: rguenth at gcc dot gnu.org, segher at kernel dot 
crashing.org,
wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

Benchmark miscompares starting with given revision. Options used for building
the benchmark are "-O3 -mcpu=power8". I did discover that adding -funroll-loops
changes behavior such that the benchmark passes.

Continuing to see if I can narrow down to a specific file that's miscompiled...

[Bug c++/85600] [9 Regression] CPU2006 471.omnetpp fails starting with r259771

2018-05-01 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85600

Pat Haugen  changed:

   What|Removed |Added

  Known to work||8.0
Summary|CPU2006 471.omnetpp fails   |[9 Regression] CPU2006
   |starting with r259771   |471.omnetpp fails starting
   ||with r259771
  Known to fail||9.0

--- Comment #3 from Pat Haugen  ---
Benchmark fails same way with no optimization.

[Bug c++/85600] CPU2006 471.omnetpp fails starting with r259771

2018-05-01 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85600

--- Comment #2 from Pat Haugen  ---
(In reply to Andrew Pinski from comment #1)
> Does adding -fno-lifetime-dse help?  This could be a bug in the omnetpp
> sources ...

Nope, still fails.

471.omnetpp: copy 0 non-zero return code (exit code=1, signal=0)

[Bug c++/85600] New: CPU2006 471.omnetpp fails starting with r259771

2018-05-01 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85600

Bug ID: 85600
   Summary: CPU2006 471.omnetpp fails starting with r259771
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
  Target Milestone: ---

Benchmark is failing at runtime, emitting following message at the end before
exiting with rc=1.

** Event #0   T=0.000  ( 0.00s)
 Messages:  created: 77472
** Event #500   T=0.0868274600 ( 86ms)
 Messages:  created: 3949482
** Event #1000   T=0.1605411650 (160ms)
 Messages:  created: 7854099
 Error in module largeNet.llanBB[48].bhost[3].mac:
(cQueue)largeNet.llanBB[48].bhost[3].mac.class-members.outputBuffer: pop():
queue empty.

End run of OMNeT++

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-19 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #20 from Pat Haugen  ---
(In reply to Richard Biener from comment #18)
> Fixed (hopefully).

Yes, mgrid performance is back. Thanks.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-18 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Pat Haugen  changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #15 from Pat Haugen  ---
Richard, concerning my prior comment, any thoughts if this is a similar issue
to what you fixed in pr55334?

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-04-13 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #14 from Pat Haugen  ---
Created attachment 43928
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43928=edit
r256888 pcom dump

So the difference appears to be occurring in predictive commoning. In the
ipa-cp clone, resid.constprop, pcom is failing to hoist some loads/expressions
from the vectorized loop. This results in an additional 9 vector loads and 5
vector adds being executed each iteration of the loop.

I've attached a pcom dump of the original resid() and the clone
resid.constprop(). You can see that in the original resid(), pcom is moving
some loads/adds, but not in resid.constprop(). BB 6 is the vectorized loop in
resid(), BB 5 is the same loop in resid.constprop().

Not sure if this is a similar issue to pr55334 wrt losing restrict.

[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582

2018-04-10 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #7 from Pat Haugen  ---
Created attachment 43901
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43901=edit
inline dump

Prior attachment was r257581 dump. This is r257582 dump.

[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582

2018-04-10 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #6 from Pat Haugen  ---
Created attachment 43900
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43900=edit
inline dump

[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582

2018-04-10 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #5 from Pat Haugen  ---
A little more detail. 48t.fnsplit splits mainGtU() into 2 functions:

mainGtU(): which contains a few early exit tests and then a call to
mainGtU.part.0()

mainGtU.part.0(): contains the remainder of mainGtU(), including the loop


Following is then the behavior in 79i.inline:

r257581: The 3 mainGtU() calls are inlined into their caller mainSimpleSort(),
and the mainGtU.part.0() calls remain.

r257582: mainGtU.part.0() is inlined back into mainGtU(), the first mainGtU()
call in mainSimpleSort() is inlined but the remaining 2 mainGtU() calls remain.

[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582

2018-04-09 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #3 from Pat Haugen  ---
(In reply to Jan Hubicka from comment #1)
> Pat, can you try to figure out what value of min-speedup is neeed to recover
> from this regression?

Using r257582, either of the following options restores the behavior of not
inlining the mainGtU call and eliminates the performance regression.

--param inline-min-speedup=18

--param max-inline-insns-auto=24

[Bug ipa/85103] [8 Regression] Performance regressions on SPEC with r257582

2018-04-09 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

--- Comment #2 from Pat Haugen  ---
(In reply to Pat Haugen from comment #0)
> 
> Very initial look at profile of bzip2 shows degradation is contained to
> mainSort(), which showed a 54% increase in run cycles. Appears one of the
> calls to mainGtU() is inlined into mainSort in the slow version, but the
> drop in cycle counts on mainGtu is no where close to the increase on
> mainSort.

Appears the inlined copy of mainGtU() creates additional register pressure
which results in register spill being generated in the loop of the inlined
copy. The non-inlined copy of the loop is approx. 125 generated insns, whereas
the inlined copy is about 215 insns (90 spill references).

[Bug middle-end/83665] [8 regression] Big code size regression and some code quality improvement at Jan 2 2018

2018-03-27 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83665

--- Comment #18 from Pat Haugen  ---
(In reply to Richard Biener from comment #17)
> Pat, please open a new bug for the regression caused by the fix.

Done, pr85103.

[Bug ipa/85103] New: Performance regressions on SPEC with r257582

2018-03-27 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85103

Bug ID: 85103
   Summary: Performance regressions on SPEC with r257582
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org,
marxin at gcc dot gnu.org, segher at kernel dot 
crashing.org,
wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64-unknown-linux-gnu
Target: powerpc64-unknown-linux-gnu
 Build: powerpc64-unknown-linux-gnu

r257582 is responsible for a 6% degradation in CPU2000 175.vpr and a 12%
degradation in CPU2006 401.bzip2. Both run on a Power7 box.

Very initial look at profile of bzip2 shows degradation is contained to
mainSort(), which showed a 54% increase in run cycles. Appears one of the calls
to mainGtU() is inlined into mainSort in the slow version, but the drop in
cycle counts on mainGtu is no where close to the increase on mainSort.

[Bug middle-end/83665] [8 regression] Big code size regression and some code quality improvement at Jan 2 2018

2018-03-26 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83665

Pat Haugen  changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #16 from Pat Haugen  ---
(In reply to Jan Hubicka from comment #14)
> Author: hubicka
> Date: Mon Feb 12 09:48:06 2018
> New Revision: 257582
> 
> URL: https://gcc.gnu.org/viewcvs?rev=257582=gcc=rev
> Log:
> 
>   PR middle-end/83665
>   * params.def (inline-min-speedup): Increase from 8 to 15.
>   (max-inline-insns-auto): Decrease from 40 to 30.
>   * ipa-split.c (consider_split): Add some buffer for function to
>   be considered inlining candidate.
>   * invoke.texi (max-inline-insns-auto, inline-min-speedup): UPdate
>   default values.
> 
> Modified:
> trunk/gcc/ChangeLog
> trunk/gcc/doc/invoke.texi
> trunk/gcc/ipa-split.c
> trunk/gcc/params.def

This change is responsible for a 6% degradation in CPU2000 175.vpr and a 12%
degradation in CPU2006 401.bzip2. Both run on a Power7 box.

[Bug target/83497] [8 Regression] CPU2000 172.mgrid starts failing with r254730

2018-03-21 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83497

Pat Haugen  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #5 from Pat Haugen  ---
I have confirmed that this is indeed just a precision difference due to a
different mix and order of instructions for the computation in the RESID loop,
valid reassociation with -ffast-math. The difference is then compounded as the
benchmark iterates over the values.

The specdiff command for mgrid specifies an absolute tolerance of "-a 1e-12"
and uses the absolute difference when seeing if two values are within the
specified tolerance. In this case they were not.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-14 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #10 from Pat Haugen  ---
(In reply to Pat Haugen from comment #9)
> (pr83497, which I'm still digging on). Ignoring output miscompare and just
> timing the two versions built with -fno-tree-vectorize, I see that the 
> performance is similar. So possibly a powerpc vector cost issue.
> 

And then again, maybe not. Running with -fno-tree-vectorize and removing
-ffast-math (which eliminates the output miscompare), I still see the
degradation.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-13 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #9 from Pat Haugen  ---
(In reply to Martin Jambor from comment #7)
> Do I understand it correctly that you suspect that the new IPA-CP
> clone that is created from r256888 on is harmful?  In that case, you
> want to test that by trying higher values of ipa-cp-eval-threshold,
> something like --param ipa-cp-eval-threshold 610 (i.e. bigger than
> 606).  Of course, if there are other clones with evaluations between
> 500 and 610, it would affect them too.
> 

Building with --param ipa-cp-eval-threshold=610 prevented the creation of the
.resid_.constprop.1 clone and eliminated the performance degradation.

Looking at the profile more in depth, I saw that most of the time in
resid_.constprop was spent in the main vectorized loop. I tried both revisions
with -fno-tree-vectorize to see if vectorization in the clone is the real
problem on powerpc, but ran into issues with output miscompare (pr83497, which
I'm still digging on). Ignoring output miscompare and just timing the two
versions built with -fno-tree-vectorize, I see that the  performance is
similar. So possibly a powerpc vector cost issue.


> You may also want to try both fast and slow revisions with
> -fno-ipa-cp-clone as the first step, actually.

Doing this showed r256888 about 4% slower, so not near as bad.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-08 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #5 from Pat Haugen  ---
Created attachment 43601
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43601=edit
ipa-cp dump (r256887)

(In reply to Martin Liška from comment #4)
> Thank you, may I please ask you for the IPA CP dump file for not affected
> revision (r256887). Do I understand the numbers right that version with
> .resid_.constprop.1 is slower?

Dump attached. And yes, the version with resid_.constprop.1 is slower.

Also, I tried the patch from
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84149#c5 and didn't see any
difference in execution time.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-07 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #3 from Pat Haugen  ---
(In reply to Martin Liška from comment #1)
> Isn't that dup of 84149? Can you please tweak --param ipa-cp-eval-threshold
> to value to 200, 300, 400? Can you please attach -fdump-ipa-cp-details file?

I tried the param with the 3 different values and none made any difference to
execution time.

[Bug tree-optimization/84737] [8 Regression] 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-07 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

--- Comment #2 from Pat Haugen  ---
Created attachment 43589
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43589=edit
ipa-cp dump

[Bug ipa/84737] New: 20% degradation in CPU2000 172.mgrid starting with r256888

2018-03-06 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84737

Bug ID: 84737
   Summary: 20% degradation in CPU2000 172.mgrid starting with
r256888
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ipa
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, marxin at gcc dot gnu.org,
segher at gcc dot gnu.org, wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64-unknown-linux-gnu
Target: powerpc64-unknown-linux-gnu
 Build: powerpc64-unknown-linux-gnu

I'm seeing a 20% degradation on 172.mgrid with r256888. Benchmark was built
with "-O3 -mcpu=power7 -ffast-math". Profiling shows the difference comes from
function resid() and its clone.

r256887
---
Counted PM_RUN_CYC events (Run_cycles.) with a unit mask of 0x00 (No unit mask)
count 10
samples  %image name   symbol name
658215   48.2563  mgrid_base.pat_test_64   .resid_
367381   26.9341  mgrid_base.pat_test_64   .psinv_
153587   11.2601  mgrid_base.pat_test_64   .interp_
1097858.0488  mgrid_base.pat_test_64   .rprj3_
52642 3.8594  mgrid_base.pat_test_64   .comm3_
7912  0.5801  mgrid_base.pat_test_64   .MAIN__
3796  0.2783  libc-2.17.so .__memset_power8



r256888
---
Counted PM_RUN_CYC events (Run_cycles.) with a unit mask of 0x00 (No unit mask)
count 10
samples  %image name   symbol name
1109100  59.2023  mgrid_base.gcc_hunt_64   .resid_.constprop.1
368930   19.6930  mgrid_base.gcc_hunt_64   .psinv_
1601028.5460  mgrid_base.gcc_hunt_64   .interp_
1149546.1361  mgrid_base.gcc_hunt_64   .MAIN__
55253 2.9493  mgrid_base.gcc_hunt_64   .comm3_
46903 2.5036  mgrid_base.gcc_hunt_64   .resid_
5103  0.2724  libc-2.17.so .__memset_power8

[Bug rtl-optimization/83530] [7/8 Regression] ICE in reset_sched_cycles_in_current_ebb, at sel-sched.c:7150

2018-02-07 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83530

Pat Haugen  changed:

   What|Removed |Added

Summary|[8 Regression] ICE in   |[7/8 Regression] ICE in
   |reset_sched_cycles_in_curre |reset_sched_cycles_in_curre
   |nt_ebb, at sel-sched.c:7150 |nt_ebb, at sel-sched.c:7150

--- Comment #10 from Pat Haugen  ---
Marking as 7 regression also as that is when the change to use -fsched-pressure
--param sched-pressure-algorithm=2 as the default for PowerPC happened. But as
I mentioned in Comment 7, the failure can be reproduced on prior versions by
adding those two options.

[Bug rtl-optimization/83530] [8 Regression] ICE in reset_sched_cycles_in_current_ebb, at sel-sched.c:7150

2018-02-07 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83530

--- Comment #9 from Pat Haugen  ---
(In reply to Andrey Belevantsev from comment #8)
> I will take a look.  The ICE is within the code that models the scheduling
> loop in order to get the proper insn ticks and everything for later MD
> processing (it is equivalent to always scheduling the next insn).  Either
> there is an issue in that loop that wasn't uncovered anywhere but powerpc or
> there is some subtlety in the powerpc cpu model that is triggered there.  It
> is not very pleasant to find out and fix usually so it will take time.

Thanks, appreciate that. I did find out the isssue is not very pleasant to
track down as you state.

[Bug rtl-optimization/83530] [8 Regression] ICE in reset_sched_cycles_in_current_ebb, at sel-sched.c:7150

2018-01-30 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83530

--- Comment #7 from Pat Haugen  ---
Assuming this is a latent selective scheduling bug since I can reproduce with
r243865 by adding -fsched-pressure --param sched-pressure-algorithm=2.
Looking...

[Bug other/83497] CPU2000 172.mgrid starts failing with r254730

2018-01-02 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83497

--- Comment #4 from Pat Haugen  ---
(In reply to Pat Haugen from comment #0)
> mgrid started failing (output miscompare) with r254730. The following
> options demonstrate the failure "-O3 -mcpu=power6 -ffast-math".

Incomplete option set, -m32 is also required.

[Bug other/83497] CPU2000 172.mgrid starts failing with r254730

2018-01-02 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83497

--- Comment #3 from Pat Haugen  ---
(In reply to Richard Biener from comment #2)
> 
> As far as I see the miscompare is -0.8 vs. 0.18 so it doesn't look like a
> precision issue to me.  Does it only happen for power6 / bigendian?
> 

Yes, the failure is only for -mcpu=power6. I don't have a copy of CPU2000 that
runs on powerpc64le, so can't say for sure if it's a big endian issue only.

I will do some further digging on the failure.

[Bug other/83497] New: CPU2000 172.mgrid starts failing with r254730

2017-12-19 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83497

Bug ID: 83497
   Summary: CPU2000 172.mgrid starts failing with r254730
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org,
rguenth at gcc dot gnu.org, segher at gcc dot gnu.org,
wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64-unknown-linux-gnu
Target: powerpc64-unknown-linux-gnu
 Build: powerpc64-unknown-linux-gnu

mgrid started failing (output miscompare) with r254730. The following options
demonstrate the failure "-O3 -mcpu=power6 -ffast-math". The miscompared output
is...

29748: -0.839533E-12
   0.182462E-12
  ^
29749: -0.839533E-12
   0.182462E-12
  ^
29750: -0.849589E-12
   0.184648E-12
  ^
29751: -0.849589E-12
   0.184648E-12
  ^
29752: -0.852151E-12
   0.185205E-12
  ^
29753: -0.852151E-12
   0.185205E-12
  ^
29754: -0.852839E-12
   0.185354E-12
  ^

A little brief history on this since it's come and gone a couple times. All
revisions deal with CFG/inlining issues.

r254730 - initial failure
r254937 - started working, only because this inadvertently disabled some
inlining
r254946 - fixed inlining from 254937, benchmark started failing again
r255103 - started working

So even though it's currently working on trunk I think there's an issue in
r255103 which I've emailed Honza about separately. If I apply the following
(which hopefully Honza will confirm is the desired behavior) to current trunk
the benchmark fails again.


Index: gcc/ipa-inline.c
===
--- gcc/ipa-inline.c(revision 255838)
+++ gcc/ipa-inline.c(working copy)
@@ -691,7 +691,7 @@
   sreal time = compute_uninlined_call_time (e, unspec_time);
   sreal inlined_time = compute_inlined_call_time (e, spec_time);

-  if (time - inlined_time * 100
+  if ((time - inlined_time) * 100
   > (sreal) (time * PARAM_VALUE (PARAM_INLINE_MIN_SPEEDUP)))
 return true;
   return false;

[Bug lto/83201] [7/8 Regression] SPEC CPU2017 505.mcf_r produces incorrect output when built with -flto and FDO

2017-12-19 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201

--- Comment #18 from Pat Haugen  ---
(In reply to Martin Liška from comment #16)
> (In reply to Richard Biener from comment #15)
> > SWAPINIT should end up with swaptype_long == 1 I think and swaptype_int == 1
> > for the cases in question.  Enforcing swaptype_int = swaptype_long = 2
> > should make it work (scratch SWAPINIT calls).
> 
> I can confirm that.

Yes, that fixes the problem for me on PowerPC also. I can pass along the info
to our SPEC rep.


Richi,
  I'm curious if the alias violations were apparent in a dump file, or did you
just happened to spot them looking through the source?

[Bug lto/83201] [7/8 Regression] SPEC CPU2017 505.mcf_r produces incorrect output when built with -flto and FDO

2017-12-15 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201

--- Comment #6 from Pat Haugen  ---
So I did a bisect of trunk during the GCC 7 development timeframe
(r235035-r247017) and it pointed to r236878 as the point where the failure
started.


+++ gcc/ChangeLog   (revision 236878)
@@ -1,3 +1,9 @@
+2016-05-30  Jan Hubicka  
+
+   * tree-ssa-loop-ivcanon.c (try_peel_loop): Correctly set wont_exit
+   for peeled copies; avoid underflow when updating estimates; correctly
+   scale loop profile.
+

[Bug lto/83201] [7/8 Regression] SPEC CPU2017 505.mcf_f produces incorrect output when built with -flto and FDO

2017-12-14 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201

--- Comment #5 from Pat Haugen  ---
Current FSF 6 branch works fine, so I have some bisect points. Will comment
further as I find out.

[Bug tree-optimization/81303] [8 Regression] 410.bwaves regression caused by r249919

2017-12-08 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81303

--- Comment #15 from Pat Haugen  ---
Just confirming that the changes have eliminated the bwaves degradation on
PowerPC that started with r249919.

[Bug lto/83201] SPEC CPU2017 505.mcf_f produces incorrect output when built with -flto and FDO

2017-11-28 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201

--- Comment #2 from Pat Haugen  ---
(In reply to Pat Haugen from comment #0)
> 
> It appears to work fine with r254943. I'll start a bisect and post results.

My bisect showed that r254946 was where it started failing on trunk. And yes,
it fails with current GCC 7 branch too.

[Bug lto/83201] New: SPEC CPU2017 505.mcf_f produces incorrect output when built with -flto and FDO

2017-11-28 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83201

Bug ID: 83201
   Summary: SPEC CPU2017 505.mcf_f produces incorrect output when
built with -flto and FDO
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: lto
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, hubicka at gcc dot gnu.org,
marxin at gcc dot gnu.org, segher at gcc dot gnu.org,
wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

505.mcf_f produces incorrect output when built with both LTO/FDO. Using either
option separately is fine. GCC trunk r255207 was used. Following are options
used.

OPTIMIZE= -O3 -mcpu=power8 -flto

PASS1_FLAGS   = -fprofile-generate
PASS1_LDFLAGS  = -fprofile-generate
PASS2_FLAGS   = -fprofile-use
PASS2_LDFLAGS  = -fprofile-use


Contents of inp.out.mis (miscompares).

0010:  simplex iterations : 107102
   simplex iterations : 107598
   ^
0014:  simplex iterations : 152479
   simplex iterations : 149876
 ^
0016:  erased arcs: 995716
   erased arcs: 995702
^
0017:  new implicit arcs  : 2995716
   new implicit arcs  : 2995702
 ^
0019:  simplex iterations : 253145
   simplex iterations : 248008
 ^
0020:  objective value: 12161789395
   objective value: 12171761765
   ^
0021:  erased arcs: 2991635
   erased arcs: 2991537
^
0022:  new implicit arcs  : 2991635
   new implicit arcs  : 2991537
^
0024:  simplex iterations : 398127
   simplex iterations : 385785
 ^
0025:  objective value: 11729854482
   objective value: 11769820561
   ^


It appears to work fine with r254943. I'll start a bisect and post results.

[Bug tree-optimization/81953] Code sinking increases register pressure

2017-08-24 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81953

--- Comment #4 from Pat Haugen  ---
(In reply to Richard Biener from comment #3)

> The interesting part is also why RTL scheduling doesn't rectify things
> here?

If you're referring to -fsched-pressure, I believe the answer is that those
algorithms are concerned about the case where pressure is more than available
hard regs, which is not the case here.

[Bug tree-optimization/81953] New: Code sinking results in increased use of callee saved registers

2017-08-23 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81953

Bug ID: 81953
   Summary: Code sinking results in increased use of callee saved
registers
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje.gcc at gmail dot com, wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

void bar();
int j;
void foo(int a, int b, int c, int d, int e, int f)
{
  int l;
  l = a + b + c + d +e + f;
  if (a != 5)
{
  bar();
  j = l;
}
}


The whole expression "l = a + b + c..." is moved past the call to bar(), which
means we now need to use 6 callee-saved regs to hold the parm values across the
call.

[Bug rtl-optimization/81340] ICE in compute_bb_dataflow, at var-tracking.c:6877

2017-07-06 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81340

Pat Haugen  changed:

   What|Removed |Added

 CC||mliska at suse dot cz

--- Comment #1 from Pat Haugen  ---
Started with r249960.

[Bug rtl-optimization/81340] New: ICE in compute_bb_dataflow, at var-tracking.c:6877

2017-07-06 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81340

Bug ID: 81340
   Summary: ICE in compute_bb_dataflow, at var-tracking.c:6877
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

$ cat table_cache.ii
class a {
  struct b {
b(int, int);
  } c;

public:
  int d;
  a(char *) : c(0, d) {}
};
class e {
  int f(const int &, const int &, const int &, bool, bool, bool, int, bool);
};
class g {
public:
  static g *h();
  void i(a, void *);
};
int e::f(const int &, const int &, const int &, bool j, bool, bool, int, bool)
{
  g::h()->i("", );
}


$ ~/install/gcc/trunk/bin/g++ -S -O2 -g -fsanitize=address table_cache.ii
table_cache.ii: In member function ‘int e::f(const int&, const int&, const
int&, bool, bool, bool, int, bool)’:
table_cache.ii:19:19: warning: ISO C++ forbids converting a string constant to
‘char*’ [-Wwrite-strings]
   g::h()->i("", );
   ^
during RTL pass: vartrack
table_cache.ii:20:1: internal compiler error: in compute_bb_dataflow, at
var-tracking.c:6877
 }
 ^
0x10f691af compute_bb_dataflow
/home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:6877
0x10f696d3 vt_find_locations
/home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:7118
0x10f6a3a3 variable_tracking_main_1
/home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:10332
0x10f6a3a3 variable_tracking_main()
/home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:10378
0x10f6a3a3 execute
/home/pthaugen/src/gcc/trunk/gcc/gcc/var-tracking.c:10415
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126

2017-05-23 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597

--- Comment #16 from Pat Haugen  ---
(In reply to Dmitry Babokin from comment #14)
> Original test case still fails with compiler switches that I've originally
> reported (-fsanitize=undefined).

Is your failure fixed with r248325?

[Bug rtl-optimization/79801] Disable ira.c:add_store_equivs for some targets?

2017-05-22 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79801

Pat Haugen  changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #1 from Pat Haugen  ---
I ran a comparison on CPU2006. The only benchmark possibly outside the noise
range was 470.lbm with a 1.9% degradation.

[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126

2017-05-17 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597

--- Comment #12 from Pat Haugen  ---
(In reply to Martin Liška from comment #11)
> Created attachment 41375 [details]
> Patch candidate v2
> 
> Can you please test this version? It moves e from 10^6 to 10^5.

That patch works for both the benchmarks that were affected.

[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126

2017-05-16 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597

--- Comment #9 from Pat Haugen  ---
(In reply to Martin Liška from comment #8)
> 
> Can you please provide a test-case? Or can you dump the sreal values via
> .to_double() ? That can be also hint for us to fix that properly.

I'm trying to reduce the source, but it's proprietary so will see what it
reduces to before I can think about posting anything. In the meantime, here's
what things look like when the assert fails.

(gdb) p info->self_time
$7 = {m_sig = 1347786301, m_exp = -13}
(gdb) p info->self_time.to_double()
$8 = 164524.69494628906
(gdb) p info->time
$9 = {m_sig = 1347789465, m_exp = -13}
(gdb) p info->time.to_double()
$10 = 164525.08117675781

[Bug libfortran/80602] Reduce stack usage for blocked matmul

2017-05-16 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80602

--- Comment #7 from Pat Haugen  ---
(In reply to Thomas Koenig from comment #6)
> I just committed r248074 which I suspect is the same problem
> (the fix for PR 80765).
> 
> If you could just upgrade to the most recent trunk (only
> need to upgrade libgfortran, really) an check if the fix
> also works for you, that would be great.

Yes, both are fixed with r248704. Thanks.

[Bug libfortran/80602] Reduce stack usage for blocked matmul

2017-05-15 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80602

Pat Haugen  changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #5 from Pat Haugen  ---
(In reply to Thomas Koenig from comment #3)
> Author: tkoenig
> Date: Mon May  8 17:56:13 2017
> New Revision: 247753

This revision introduced a couple problems with SPEC on PowerPC. Both failures
happen for -m32 only.

1) CPU2000 178.galgel now fails with a verification error (i.e. incorrect
output).

2) CPU2006 465.tonto segfaults when running.

I'll add more detail as I continue digging...

[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126

2017-05-12 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597

--- Comment #7 from Pat Haugen  ---
(In reply to Pat Haugen from comment #6)
> 
> I just ran into the same ICE and the proposed patch fixes the problem.

Unfortunately the patch introduces the same ICE on another benchmark that used
to build just fine.

[Bug ipa/80597] [8 Regression] internal compiler error: in compute_inline_parameters, at ipa-inline-analysis.c:3126

2017-05-12 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80597

Pat Haugen  changed:

   What|Removed |Added

 CC||pthaugen at gcc dot gnu.org

--- Comment #6 from Pat Haugen  ---
(In reply to Martin Liška from comment #5)
> Created attachment 41349 [details]
> Patch candidate
> 
> Yep, it's Honza Hubicka's PR. I'm suggesting a new function that will handle
> round off errors in sreal.
> 
> Can you please Honza take a look? Can you Dmitry test it?

I just ran into the same ICE and the proposed patch fixes the problem.

[Bug tree-optimization/80705] Incorrect code generated for profile counter updates due to SLP+LIM

2017-05-10 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80705

--- Comment #1 from Pat Haugen  ---
I should have noted that the dumps I was looking at were slp1 and lim4.

[Bug tree-optimization/80705] New: Incorrect code generated for profile counter updates due to SLP+LIM

2017-05-10 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80705

Bug ID: 80705
   Summary: Incorrect code generated for profile counter updates
due to SLP+LIM
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pthaugen at gcc dot gnu.org
CC: dje at gcc dot gnu.org, wschmidt at gcc dot gnu.org
  Target Milestone: ---
  Host: powerpc64le-unknown-linux-gnu
Target: powerpc64le-unknown-linux-gnu
 Build: powerpc64le-unknown-linux-gnu

Created attachment 41338
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=41338=edit
reduced testcase

The attached testcase shows a problem where profile counter updates are
incorrectly generated, which then leads to invalid profile info when the
original source is rebuilt with -fprofile-use.

Compile options used : -Ofast -mcpu=power8 -fprofile-generate

The problem occurs on the edge counter updates for the following inner loop:

 while (*s && *s!='\r' && *s!='\n' && *s!='"')

SLP vectorization combines adjacent counter writes on the exit paths from the
loop into vector store operations. LIM then comes along and hoists the initial
counter read outside the outer loop. This causes the problem because when the
inner loop is entered again the edge counters are initialized to the values
originally read from memory (i.e. values when the function was originally
entered) NOT the updated counter values that were written to memory when
exiting the inner loop. Aliasing problem?

[Bug rtl-optimization/80357] [7 Regression] ICE in model_update_limit_points_in_group, at haifa-sched.c:1982 on powerpc64le-linux-gnu

2017-04-10 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80357

--- Comment #7 from Pat Haugen  ---
(In reply to Bill Schmidt from comment #6)
> That revision enabled -fsched-pressure by default, so it may have been
> latent with -fsched-pressure before then.

Yes, this is a latent bug in the "model" sched-pressure algorithm code. I can
reproduce it with r243865 (revision before I turned on -fsched-pressure for
powerpc) by adding -fsched-pressure --param sched-pressure-algorithm=2.

I'll do some digging.

[Bug target/80107] ICE in final_scan_insn, at final.c:2964

2017-03-31 Thread pthaugen at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80107

Pat Haugen  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Pat Haugen  ---
Fixed.

  1   2   3   4   >