Re: [RFA][tree-optimization/90037] Cleanup const/copies between DOM and erroneous path isolation

2019-04-25 Thread Richard Biener
On April 24, 2019 11:26:28 PM GMT+02:00, Jeff Law  wrote:
>On 4/24/19 4:44 AM, Richard Biener wrote:
>>> Given that we can use the lattice copy propagator by just adding the
>>> pass to passes.def whereas using the RPN VN actually requires a
>little
>>> bit of real code (to set up the entry/exits for the relevant SEME
>>> regions), I went with the lattice copy propagator.
>>>
>>> This change adds around .4% instruction executions to my testbed of
>.i
>>> files.  It has no significant impact on the resulting code -- I see
>>> different register allocation decisions in a lot of places which
>seem to
>>> primarily result in reversing arguments to comparisons.
>> 
>> Was there a need to have two copy-prop passes in the early
>> DOM/errorneous-path removal where we previously only had
>> a single phi-only-prop pass?  Is the testcase fixed also when
>> doing copy-prop only a single time?
>So if we replace phi-only cprop with the lattice propagator and move
>the
>pass which currently runs before erroneous path isolation so that it
>instead runs before erroneous path isolation we're in pretty good
>shape.
>
>isolate-2.c and isolate-4.c needed twiddling -- they need to look later
>in the pipeline for an expected simplification, but the simplification
>still occurs and it's not too much later than before.
>
>
>I've bootstrapped and regression tested on x86_64, but no other targets
>at this point.
>
>OK for the trunk now?

OK. 

Richard. 

>
>Jeff



Re: [RFA][tree-optimization/90037] Cleanup const/copies between DOM and erroneous path isolation

2019-04-25 Thread Richard Biener
On April 24, 2019 8:31:44 PM GMT+02:00, Jeff Law  wrote:
>On 4/24/19 4:44 AM, Richard Biener wrote:
>> On Tue, Apr 23, 2019 at 4:29 PM Jeff Law  wrote:
>>>
>>>
>>> As discussed in the BZ, this patch addresses the false positive
>warning
>>> by cleaning up the const/copy propagations left in the IL between
>DOM's
>>> jump threading and erroneous path isolation.
>>>
>>> In the past we'd been handling this stuff with phi only cprop.  To
>make
>>> phi only cprop work in this case we'd have to change it to scan
>>> statements within some subset of blocks where it had previously only
>>> scanned PHIs.
>>>
>>> I was concerned about the compile-time cost of the additional
>scanning
>>> plus the extra pass.  So I compared that against using the lattice
>based
>>> const/copy propagation as well as against the RPO VN bits.
>>>
>>> It turns out that the lattice based const/copy propagation does the
>>> least amount of work, closely followed by a limited RPO VN.  The
>>> improved "phi only" copy propagator being the worst.
>> 
>> Interesting.
>Definitely.  I didn't dig further than what's mentioned in the BZ.
>
>> 
>>> Given that we can use the lattice copy propagator by just adding the
>>> pass to passes.def whereas using the RPN VN actually requires a
>little
>>> bit of real code (to set up the entry/exits for the relevant SEME
>>> regions), I went with the lattice copy propagator.
>>>
>>> This change adds around .4% instruction executions to my testbed of
>.i
>>> files.  It has no significant impact on the resulting code -- I see
>>> different register allocation decisions in a lot of places which
>seem to
>>> primarily result in reversing arguments to comparisons.
>> 
>> Was there a need to have two copy-prop passes in the early
>> DOM/errorneous-path removal where we previously only had
>> a single phi-only-prop pass?  Is the testcase fixed also when
>> doing copy-prop only a single time?
>The testcase is fixed with a single copyprop (lattice or RPO VN) after
>DOM but before erroneous path isolation.   I seriously considered just
>dropping the copyprop pass after erroneous path isolation.  I'm pretty
>sure it'll regress codegen quality, but it may not be too bad.
>
>
>> 
>> Also the late pass you replace should be right after VRP, not
>> after warn_restrict (that was a mistake done when adding the
>> warn_restrict pass).
>Agreed.  But ISTM we should make that an independent change.
>
>> 
>> The main reason I dislike this is that it is an unconditional cleanup
>> pass run even when we didn't perform any jump threading.  That's
>> of course the same case as with the existing phi-only-prop passes
>> but would have been my major argument for the SEME VN
>> (where at some cut-off of course doing a single whole-function VN
>> is cheaper than doing N SEME VNs, and I can even think of doing
>> N SEME regions at once in exchange for doing a whole-function
>> RPO order compute).
>Yea.  We're certainly doing more work in the cases where we didn't
>thread anything.  I've always wanted better ways to indicate what
>actions a pass did and using that to bypass subsequent passes if they
>weren't going to be profitable.
>
>
>
>
>> 
>>> FWIW I also considered delaying the erroneous path isolation pass. 
>I
>>> ultimately decided against that.  My recollection is that its
>location
>>> came from the desire to clean up those paths early enough in the
>>> pipeline so that other optimizers could do a better job.
>>>
>>> We could consider an early/late split here.  Essentially we do the
>>> optimization early, leaving enough data lying around somewhere for a
>>> late pass to look for and issue the warning.  Something like a
>>> __builtin_warning that we leave in the IL, then just before
>expanding to
>>> RTL, we scan the IL once and issue the __builtin_warnings.
>>>
>>> In this specific case the __builtin_warning would be on a path that
>we
>>> eventually would determine was unreachable at compile time and the
>path
>>> would be removed.  I suspect we could do something similar for other
>>> warnings coming out of the middle end.
>>>
>>> Anyway, this has been bootstrapped and regression tested on x86_64,
>>> ppc64, i686, sparc64, & ppc64le.  It's also been bootstrapped on
>alpha,
>>> ppc (32 bit), armeb, m68k, riscv64, mipsisa32r4, arm, sh4, & sh4eb.
>>> It's also built and regression tested all the *-elf targets in my
>tester.
>>>
>>> OK for the trunk, or do we want to defer to gcc-10?
>> 
>> I like the pass removal and would say OK if you manage with
>> a 1:1 replacement (thus get rid of that extra copy-prop between
>> DOM and pass_isolate_erroneous_paths).
>That's the one we need to fix the regression.  We might be able to drop
>the one after erroneous path isolation which would keep us at the 1:1
>replacement.  I'll poke at that.

Works for me as well if testing succeeds. 

Richard. 

>jeff



Re: [RFA][tree-optimization/90037] Cleanup const/copies between DOM and erroneous path isolation

2019-04-24 Thread Jeff Law
On 4/24/19 4:44 AM, Richard Biener wrote:
>> Given that we can use the lattice copy propagator by just adding the
>> pass to passes.def whereas using the RPN VN actually requires a little
>> bit of real code (to set up the entry/exits for the relevant SEME
>> regions), I went with the lattice copy propagator.
>>
>> This change adds around .4% instruction executions to my testbed of .i
>> files.  It has no significant impact on the resulting code -- I see
>> different register allocation decisions in a lot of places which seem to
>> primarily result in reversing arguments to comparisons.
> 
> Was there a need to have two copy-prop passes in the early
> DOM/errorneous-path removal where we previously only had
> a single phi-only-prop pass?  Is the testcase fixed also when
> doing copy-prop only a single time?
So if we replace phi-only cprop with the lattice propagator and move the
pass which currently runs before erroneous path isolation so that it
instead runs before erroneous path isolation we're in pretty good shape.

isolate-2.c and isolate-4.c needed twiddling -- they need to look later
in the pipeline for an expected simplification, but the simplification
still occurs and it's not too much later than before.


I've bootstrapped and regression tested on x86_64, but no other targets
at this point.

OK for the trunk now?


Jeff
* Makefile.in (OBJS): Remove tree-ssa-phionlycprop.c
* passes.def: Replace all instance of phi-only cprop with the
lattice propagator.  Move propagation pass from after erroneous
path isolation to before erroneous path isolation.
* tree-ssa-phionlycprop.c: Remove.

* gcc.dg/tree-ssa/20030710-1.c: Update dump file to scan.
* gcc.dg/isolate-2.c: Likewise.
* gcc.dg/isolate-4.c: Likewise.
* gcc.dg/pr19431.c: Accept either ordering of PHI args.



diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d186d71c91e..5f43d9de00e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1559,7 +1559,6 @@ OBJS = \
tree-ssa-loop.o \
tree-ssa-math-opts.o \
tree-ssa-operands.o \
-   tree-ssa-phionlycprop.o \
tree-ssa-phiopt.o \
tree-ssa-phiprop.o \
tree-ssa-pre.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index 446a7c48276..bc147cd 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -222,19 +222,13 @@ along with GCC; see the file COPYING3.  If not see
 trying to move or duplicate pass_dominator somewhere earlier.  */
   NEXT_PASS (pass_thread_jumps);
   NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
-  /* At this point the majority of const/copy propagations
-are exposed.  Go ahead and identify paths that should never
-be executed in a conforming program and isolate those paths.
-
-This will expose more degenerate PHIs in the main path and
-expose more PRE/DOM optimization opportunities.  */
+  /* Threading can leave many const/copy propagations in the IL.
+Clean them up.  Failure to do so well can lead to false
+positives from warnings for erroneous code.  */
+  NEXT_PASS (pass_copy_prop);
+  /* Identify paths that should never be executed in a conforming
+program and isolate those paths.  */
   NEXT_PASS (pass_isolate_erroneous_paths);
-  /* The only const/copy propagation opportunities left after
-DOM and erroneous path isolation should be due to degenerate PHI nodes.
-So rather than run the full propagators, run a specialized pass which
-only examines PHIs to discover const/copy propagation
-opportunities.  */
-  NEXT_PASS (pass_phi_only_cprop);
   NEXT_PASS (pass_dse);
   NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
   NEXT_PASS (pass_dce);
@@ -321,13 +315,10 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_strlen);
   NEXT_PASS (pass_thread_jumps);
   NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
-  /* The only const/copy propagation opportunities left after
-DOM and VRP should be due to degenerate PHI nodes.  So rather than
-run the full propagators, run a specialized pass which
-only examines PHIs to discover const/copy propagation
-opportunities.  */
   NEXT_PASS (pass_warn_restrict);
-  NEXT_PASS (pass_phi_only_cprop);
+  /* Threading can leave many const/copy propagations in the IL.
+Clean them up.  */
+  NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_dse);
   NEXT_PASS (pass_cd_dce);
   NEXT_PASS (pass_forwprop);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c
index 3dd3ba8bc17..529c79b26e3 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/20030710-1.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O1 -fdump-tree-phicprop1" } */
+/* { dg-options "-O1 

Re: [RFA][tree-optimization/90037] Cleanup const/copies between DOM and erroneous path isolation

2019-04-24 Thread Jeff Law
On 4/24/19 4:44 AM, Richard Biener wrote:
> On Tue, Apr 23, 2019 at 4:29 PM Jeff Law  wrote:
>>
>>
>> As discussed in the BZ, this patch addresses the false positive warning
>> by cleaning up the const/copy propagations left in the IL between DOM's
>> jump threading and erroneous path isolation.
>>
>> In the past we'd been handling this stuff with phi only cprop.  To make
>> phi only cprop work in this case we'd have to change it to scan
>> statements within some subset of blocks where it had previously only
>> scanned PHIs.
>>
>> I was concerned about the compile-time cost of the additional scanning
>> plus the extra pass.  So I compared that against using the lattice based
>> const/copy propagation as well as against the RPO VN bits.
>>
>> It turns out that the lattice based const/copy propagation does the
>> least amount of work, closely followed by a limited RPO VN.  The
>> improved "phi only" copy propagator being the worst.
> 
> Interesting.
Definitely.  I didn't dig further than what's mentioned in the BZ.

> 
>> Given that we can use the lattice copy propagator by just adding the
>> pass to passes.def whereas using the RPN VN actually requires a little
>> bit of real code (to set up the entry/exits for the relevant SEME
>> regions), I went with the lattice copy propagator.
>>
>> This change adds around .4% instruction executions to my testbed of .i
>> files.  It has no significant impact on the resulting code -- I see
>> different register allocation decisions in a lot of places which seem to
>> primarily result in reversing arguments to comparisons.
> 
> Was there a need to have two copy-prop passes in the early
> DOM/errorneous-path removal where we previously only had
> a single phi-only-prop pass?  Is the testcase fixed also when
> doing copy-prop only a single time?
The testcase is fixed with a single copyprop (lattice or RPO VN) after
DOM but before erroneous path isolation.   I seriously considered just
dropping the copyprop pass after erroneous path isolation.  I'm pretty
sure it'll regress codegen quality, but it may not be too bad.


> 
> Also the late pass you replace should be right after VRP, not
> after warn_restrict (that was a mistake done when adding the
> warn_restrict pass).
Agreed.  But ISTM we should make that an independent change.

> 
> The main reason I dislike this is that it is an unconditional cleanup
> pass run even when we didn't perform any jump threading.  That's
> of course the same case as with the existing phi-only-prop passes
> but would have been my major argument for the SEME VN
> (where at some cut-off of course doing a single whole-function VN
> is cheaper than doing N SEME VNs, and I can even think of doing
> N SEME regions at once in exchange for doing a whole-function
> RPO order compute).
Yea.  We're certainly doing more work in the cases where we didn't
thread anything.  I've always wanted better ways to indicate what
actions a pass did and using that to bypass subsequent passes if they
weren't going to be profitable.




> 
>> FWIW I also considered delaying the erroneous path isolation pass.  I
>> ultimately decided against that.  My recollection is that its location
>> came from the desire to clean up those paths early enough in the
>> pipeline so that other optimizers could do a better job.
>>
>> We could consider an early/late split here.  Essentially we do the
>> optimization early, leaving enough data lying around somewhere for a
>> late pass to look for and issue the warning.  Something like a
>> __builtin_warning that we leave in the IL, then just before expanding to
>> RTL, we scan the IL once and issue the __builtin_warnings.
>>
>> In this specific case the __builtin_warning would be on a path that we
>> eventually would determine was unreachable at compile time and the path
>> would be removed.  I suspect we could do something similar for other
>> warnings coming out of the middle end.
>>
>> Anyway, this has been bootstrapped and regression tested on x86_64,
>> ppc64, i686, sparc64, & ppc64le.  It's also been bootstrapped on alpha,
>> ppc (32 bit), armeb, m68k, riscv64, mipsisa32r4, arm, sh4, & sh4eb.
>> It's also built and regression tested all the *-elf targets in my tester.
>>
>> OK for the trunk, or do we want to defer to gcc-10?
> 
> I like the pass removal and would say OK if you manage with
> a 1:1 replacement (thus get rid of that extra copy-prop between
> DOM and pass_isolate_erroneous_paths).
That's the one we need to fix the regression.  We might be able to drop
the one after erroneous path isolation which would keep us at the 1:1
replacement.  I'll poke at that.

jeff


Re: [RFA][tree-optimization/90037] Cleanup const/copies between DOM and erroneous path isolation

2019-04-24 Thread Richard Biener
On Tue, Apr 23, 2019 at 4:29 PM Jeff Law  wrote:
>
>
> As discussed in the BZ, this patch addresses the false positive warning
> by cleaning up the const/copy propagations left in the IL between DOM's
> jump threading and erroneous path isolation.
>
> In the past we'd been handling this stuff with phi only cprop.  To make
> phi only cprop work in this case we'd have to change it to scan
> statements within some subset of blocks where it had previously only
> scanned PHIs.
>
> I was concerned about the compile-time cost of the additional scanning
> plus the extra pass.  So I compared that against using the lattice based
> const/copy propagation as well as against the RPO VN bits.
>
> It turns out that the lattice based const/copy propagation does the
> least amount of work, closely followed by a limited RPO VN.  The
> improved "phi only" copy propagator being the worst.

Interesting.

> Given that we can use the lattice copy propagator by just adding the
> pass to passes.def whereas using the RPN VN actually requires a little
> bit of real code (to set up the entry/exits for the relevant SEME
> regions), I went with the lattice copy propagator.
>
> This change adds around .4% instruction executions to my testbed of .i
> files.  It has no significant impact on the resulting code -- I see
> different register allocation decisions in a lot of places which seem to
> primarily result in reversing arguments to comparisons.

Was there a need to have two copy-prop passes in the early
DOM/errorneous-path removal where we previously only had
a single phi-only-prop pass?  Is the testcase fixed also when
doing copy-prop only a single time?

Also the late pass you replace should be right after VRP, not
after warn_restrict (that was a mistake done when adding the
warn_restrict pass).

The main reason I dislike this is that it is an unconditional cleanup
pass run even when we didn't perform any jump threading.  That's
of course the same case as with the existing phi-only-prop passes
but would have been my major argument for the SEME VN
(where at some cut-off of course doing a single whole-function VN
is cheaper than doing N SEME VNs, and I can even think of doing
N SEME regions at once in exchange for doing a whole-function
RPO order compute).

> FWIW I also considered delaying the erroneous path isolation pass.  I
> ultimately decided against that.  My recollection is that its location
> came from the desire to clean up those paths early enough in the
> pipeline so that other optimizers could do a better job.
>
> We could consider an early/late split here.  Essentially we do the
> optimization early, leaving enough data lying around somewhere for a
> late pass to look for and issue the warning.  Something like a
> __builtin_warning that we leave in the IL, then just before expanding to
> RTL, we scan the IL once and issue the __builtin_warnings.
>
> In this specific case the __builtin_warning would be on a path that we
> eventually would determine was unreachable at compile time and the path
> would be removed.  I suspect we could do something similar for other
> warnings coming out of the middle end.
>
> Anyway, this has been bootstrapped and regression tested on x86_64,
> ppc64, i686, sparc64, & ppc64le.  It's also been bootstrapped on alpha,
> ppc (32 bit), armeb, m68k, riscv64, mipsisa32r4, arm, sh4, & sh4eb.
> It's also built and regression tested all the *-elf targets in my tester.
>
> OK for the trunk, or do we want to defer to gcc-10?

I like the pass removal and would say OK if you manage with
a 1:1 replacement (thus get rid of that extra copy-prop between
DOM and pass_isolate_erroneous_paths).

Richard.

>
> Jeff
>
>
>


[RFA][tree-optimization/90037] Cleanup const/copies between DOM and erroneous path isolation

2019-04-23 Thread Jeff Law

As discussed in the BZ, this patch addresses the false positive warning
by cleaning up the const/copy propagations left in the IL between DOM's
jump threading and erroneous path isolation.

In the past we'd been handling this stuff with phi only cprop.  To make
phi only cprop work in this case we'd have to change it to scan
statements within some subset of blocks where it had previously only
scanned PHIs.

I was concerned about the compile-time cost of the additional scanning
plus the extra pass.  So I compared that against using the lattice based
const/copy propagation as well as against the RPO VN bits.

It turns out that the lattice based const/copy propagation does the
least amount of work, closely followed by a limited RPO VN.  The
improved "phi only" copy propagator being the worst.

Given that we can use the lattice copy propagator by just adding the
pass to passes.def whereas using the RPN VN actually requires a little
bit of real code (to set up the entry/exits for the relevant SEME
regions), I went with the lattice copy propagator.

This change adds around .4% instruction executions to my testbed of .i
files.  It has no significant impact on the resulting code -- I see
different register allocation decisions in a lot of places which seem to
primarily result in reversing arguments to comparisons.

FWIW I also considered delaying the erroneous path isolation pass.  I
ultimately decided against that.  My recollection is that its location
came from the desire to clean up those paths early enough in the
pipeline so that other optimizers could do a better job.

We could consider an early/late split here.  Essentially we do the
optimization early, leaving enough data lying around somewhere for a
late pass to look for and issue the warning.  Something like a
__builtin_warning that we leave in the IL, then just before expanding to
RTL, we scan the IL once and issue the __builtin_warnings.

In this specific case the __builtin_warning would be on a path that we
eventually would determine was unreachable at compile time and the path
would be removed.  I suspect we could do something similar for other
warnings coming out of the middle end.

Anyway, this has been bootstrapped and regression tested on x86_64,
ppc64, i686, sparc64, & ppc64le.  It's also been bootstrapped on alpha,
ppc (32 bit), armeb, m68k, riscv64, mipsisa32r4, arm, sh4, & sh4eb.
It's also built and regression tested all the *-elf targets in my tester.

OK for the trunk, or do we want to defer to gcc-10?

Jeff



diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d186d71c91e..5f43d9de00e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1559,7 +1559,6 @@ OBJS = \
tree-ssa-loop.o \
tree-ssa-math-opts.o \
tree-ssa-operands.o \
-   tree-ssa-phionlycprop.o \
tree-ssa-phiopt.o \
tree-ssa-phiprop.o \
tree-ssa-pre.o \
diff --git a/gcc/passes.def b/gcc/passes.def
index 446a7c48276..2f01a2dd72b 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -222,19 +222,16 @@ along with GCC; see the file COPYING3.  If not see
 trying to move or duplicate pass_dominator somewhere earlier.  */
   NEXT_PASS (pass_thread_jumps);
   NEXT_PASS (pass_dominator, true /* may_peel_loop_headers_p */);
-  /* At this point the majority of const/copy propagations
-are exposed.  Go ahead and identify paths that should never
-be executed in a conforming program and isolate those paths.
-
-This will expose more degenerate PHIs in the main path and
-expose more PRE/DOM optimization opportunities.  */
+  /* Threading can leave many const/copy propagations in the IL.
+Clean them up.  Failure to do so well can lead to false
+positives from warnings for erroneous code.  */
+  NEXT_PASS (pass_copy_prop);
+  /* Identify paths that should never be executed in a conforming
+program and isolate those paths.  */
   NEXT_PASS (pass_isolate_erroneous_paths);
-  /* The only const/copy propagation opportunities left after
-DOM and erroneous path isolation should be due to degenerate PHI nodes.
-So rather than run the full propagators, run a specialized pass which
-only examines PHIs to discover const/copy propagation
-opportunities.  */
-  NEXT_PASS (pass_phi_only_cprop);
+  /* Path isolation can leave many const/copy propagations in the IL.
+Clean them up.  */
+  NEXT_PASS (pass_copy_prop);
   NEXT_PASS (pass_dse);
   NEXT_PASS (pass_reassoc, true /* insert_powi_p */);
   NEXT_PASS (pass_dce);
@@ -321,13 +318,10 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_strlen);
   NEXT_PASS (pass_thread_jumps);
   NEXT_PASS (pass_vrp, false /* warn_array_bounds_p */);
-  /* The only const/copy propagation opportunities left after
-DOM and VRP should be due to degenerate PHI nodes.  So rather than
-run the full propagators, run a