Hi Richard,
Thanks for the review.
On Tue, 30 Oct 2018 at 01:25, Richard Biener <richard.guent...@gmail.com> wrote:
>
> On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah
> <kugan.vivekanandara...@linaro.org> wrote:
> >
> > Hi Richard and Jeff,
> >
> > Thanks for your comments.
> >
> > On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guent...@gmail.com> 
> > wrote:
> > >
> > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <l...@redhat.com> wrote:
> > > >
> > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > > > > Hi,
> > > > >
> > > > > PR87528 showed a case where libgcc generated popcount is causing
> > > > > regression for Skylake.
> > > > > We also have PR86677 where kernel build is failing because the kernel
> > > > > does not use the libgcc (when backend is not defining popcount
> > > > > pattern).  While I agree that the kernel should implement its own
> > > > > functionality when it is not using the libgcc, I am afraid that the
> > > > > implementation can have the same performance issues reported for
> > > > > Skylake in PR87528.
> > > > >
> > > > > Therefore, I would like to propose that we disable popcount detection
> > > > > when we don't have a pattern for that. The attached patch (based on
> > > > > previous discussions) does this.
> > > > >
> > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > > > > regressions. We need to disable the popcount* testcases. I will have
> > > > > to define a effective_target_with_popcount in
> > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > > > > Thanks,
> > > > > Kugan
> > > > >
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 2018-10-25  Kugan Vivekanandarajah  <kug...@linaro.org>
> > > > >
> > > > >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN 
> > > > > POPCOUNT
> > > > >     as expensive when backend does not define it.
> > > > >
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > 2018-10-25  Kugan Vivekanandarajah  <kug...@linaro.org>
> > > > >
> > > > >     * gcc.target/aarch64/popcount4.c: New test.
> > > > >
> > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere
> > > > (number_of_iterations_popcount) in my tester.  It may in fact be an old
> > > > patch from you.
> > > >
> > > > Richi argued that it's the kernel team's responsibility to provide a
> > > > popcount since they don't link with libgcc.  And I'm generally in
> > > > agreement with that position, though it does tend to generate some
> > > > friction with the kernel developers.  We also run the real risk of GCC 9
> > > > not being able to build the kernel which, IMHO, would be a disaster from
> > > > a PR standpoint.
> > > >
> > > > I'd like to hear from others here.  I fully realize we're beyond the
> > > > realm of what is strictly technically correct here from a review 
> > > > standpoint.
> > >
> > > As said final value replacement to a library call is probably not wanted
> > > for optimization purpose, so adjusting expression_expensive_p is OK with
> > > me.  It might not fully solve the (non-)issue in case another 
> > > optimization pass
> > > chooses to materialize niter computation result.
> > >
> > > Few comments on the patch:
> > >
> > > +      tree fndecl = get_callee_fndecl (expr);
> > > +
> > > +      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
> > > +       {
> > > +         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));
> > >
> > >   combined_fn cfn = gimple_call_combined_fn (expr);
> > >   switch (cfn)
> > >     {
> >
> > Did you mean:
> > combined_fn cfn = get_call_combined_fn (expr);
>
> Yes.
>
> > > ...
> > >
> > > cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is 
> > > mostly
> > > offline but eventually he knows whether there is a better way to query
> > >
> > > +           CASE_CFN_POPCOUNT:
> > > +             /* Check if opcode for popcount is available.  */
> > > +             if (optab_handler (popcount_optab,
> > > +                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
> > > (expr, 0))))
> > > +                 == CODE_FOR_nothing)
> > > +               return true;
> > >
> > > note that we currently generate builtin calls rather than IFN calls
> > > (when a direct
> > > optab is supported).
> > >
> > > Another comment on the patch is that you probably have to adjust existing
> > > popcount testcases to add architecture specific flags enabling suport for
> > > the instructions, otherwise you won't see loop replacement.
> > Indeed.
> > In lib/target-supports.exp, I will try to add support for
> > check_effective_target_popcount_long.
> > When I grep for the popcount pattern in md files, I see it is defined for:
> >
> > tilegx
> > tilepro
> > alpha
> > aarch64  when TARGET_SIMD
> > ia64
> > rs6000
> > i386  when TARGET_POPCOUNT
> > popwerpcspce  when TARGET_POPCNTB || TARGET_POPCNTD
> > s390  when TARGET_Z916 && TARGET_64BIT
> > sparc when TARGET_POPC
> > arm when TARGET_NEON
> > mips when ISA_HAS_POP
> > spu
> > avr
> >
> > I can check these targets with the condition.
> > Another possibility is to check with a sample code and see if we are
> > getting a libcall in the asm. Not sure if that is straightforward. Are
> > there any example for such.
>
> You could try linking w/o libgcc ...
I realized that there are some examples already and I  have based it
on that. Attached patch
0001-fix-kernel-build-v3.patch does this. Bootstrapped and regression
tested on aarch64-linux-gnu with no new regression. Is this OK?

>
> > We could also move these test to a primary target that is tested often
> > tested which also defines popcount pattern. I dont think these tests
> > change for targets and if we can test in one target that could be
> > enough,
> >
> > I am happy to implement what is appropriate.
>
> How about the -Os idea?
Attached patch 0002-allow-builtin-popcount-for-size_p.patch attempts
this. As you mentioned, this will again break the kernel. While I am
trying to provide the popcount implementation for the newer kernel, it
will still be a problem for existing kernel versions. May I request
that we provide a flag to disable this if we decide to go with the
patch please?

Thanks,
Kugan

>
> Richard.
>
> > Thanks,
> > Kugan
> >
> >
> >
> > >
> > > Also I think that the expression is only expensive (for final value
> > > replacement!)
> > > if you consider optimizing for speed.  When optimizing for size getting 
> > > rid of
> > > the loop is probably beneificial unconditionally.  That would leave the
> > > possibility to switch said testcases to -Os.  It would require adding a
> > > bool size_p flag to expression_expensive and passing down
> > > optimize_loop_for_size_p ().
> > >
> > > _NOTE_ that expression_expensive_p is also used by IVOPTs and there
> > > replacing sth with an expression based on the niter analysis result 
> > > doesn't
> > > mean we get rid of the loop (but only of an IV), so maybe that reasoning
> > > doesn't apply there.
> > >
> > > Richard.
> > >
> > > > Jeff
> > > >
From 085d75c277bab60e8f71caff0c396a6495e38bcb Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Wed, 24 Oct 2018 20:33:50 +1100
Subject: [PATCH 1/2] fix kernel build [v3]

Change-Id: I4df1ccb6bb671610072dfbc60afc5724e17609bb
---
 gcc/testsuite/g++.dg/tree-ssa/pr86544.C      |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount.c     |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount2.c    |  1 +
 gcc/testsuite/gcc.dg/tree-ssa/popcount3.c    |  1 +
 gcc/testsuite/gcc.target/aarch64/popcount4.c | 14 ++++++++++++++
 gcc/testsuite/lib/target-supports.exp        | 12 ++++++++++++
 gcc/tree-scalar-evolution.c                  | 16 ++++++++++++++++
 7 files changed, 46 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcount4.c

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
index fd844b4..ef43891 100644
--- a/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr86544.C
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target popcountl } */
 /* { dg-options "-O2 -fdump-tree-phiopt4 -fdump-tree-optimized" } */
 
 int PopCount (long b) {
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
index a5ec3b3..b469410 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target popcountl } */
 /* { dg-options "-O3 -fdump-tree-optimized -fno-tree-ch" } */
 
 extern int foo (int);
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
index 9096c6b..ef73e34 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount2.c
@@ -1,4 +1,5 @@
 /* { dg-do run } */
+/* { dg-require-effective-target popcountl } */
 /* { dg-options "-O2 -fno-tree-ch -fdump-tree-optimized" } */
 
 int
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
index fd844b4..ef43891 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/popcount3.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-require-effective-target popcountl } */
 /* { dg-options "-O2 -fdump-tree-phiopt4 -fdump-tree-optimized" } */
 
 int PopCount (long b) {
diff --git a/gcc/testsuite/gcc.target/aarch64/popcount4.c b/gcc/testsuite/gcc.target/aarch64/popcount4.c
new file mode 100644
index 0000000..ee55b2e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcount4.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized -mgeneral-regs-only" } */
+
+int PopCount (long b) {
+    int c = 0;
+
+    while (b) {
+	b &= b - 1;
+	c++;
+    }
+    return c;
+}
+
+/* { dg-final { scan-tree-dump-times "__builtin_popcount" 0 "optimized" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index fd74c04..214582f 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -6500,6 +6500,18 @@ proc check_effective_target_sync_long_long { } {
     }
 }
 
+# Return 1 if the target supports popcount on long.
+
+proc check_effective_target_popcountl { } {
+    return [check_no_messages_and_pattern popcountl "!__popcount" assembly {
+	int foo ()
+	  {
+	    long volatile b = 3;
+	    return __builtin_popcountl (b);
+	  }
+    } "" ]
+}
+
 # Return 1 if the target supports atomic operations on "long long"
 # and can execute them.
 #
diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 6475743..01e29e3 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -257,7 +257,9 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
+#include "target.h"
 #include "rtl.h"
+#include "optabs-query.h"
 #include "tree.h"
 #include "gimple.h"
 #include "ssa.h"
@@ -282,6 +284,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-fold.h"
 #include "tree-into-ssa.h"
 #include "builtins.h"
+#include "case-cfn-macros.h"
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
@@ -3501,6 +3504,19 @@ expression_expensive_p (tree expr)
       tree arg;
       call_expr_arg_iterator iter;
 
+      combined_fn cfn = get_call_combined_fn (expr);
+      switch (cfn)
+	{
+	CASE_CFN_POPCOUNT:
+	  /* Check if opcode for popcount is available.  */
+	  if (optab_handler (popcount_optab,
+			     TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0))))
+	      == CODE_FOR_nothing)
+	    return true;
+	default:
+	  break;
+	}
+
       if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
 	return true;
       FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
-- 
2.7.4

From 714c0fff95c3cd4bfde14f7649deb829962ec50b Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah <kugan.vivekanandarajah@linaro.org>
Date: Fri, 2 Nov 2018 19:46:06 +1100
Subject: [PATCH 2/2] allow builtin popcount for size_p

Change-Id: I3843088970b9cad482b2d480f7aa3e1d5fa11614
---
 gcc/tree-scalar-evolution.c | 24 +++++++++++++-----------
 gcc/tree-scalar-evolution.h |  2 +-
 gcc/tree-ssa-loop-ivopts.c  |  8 ++++++--
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/gcc/tree-scalar-evolution.c b/gcc/tree-scalar-evolution.c
index 01e29e3..bace52c 100644
--- a/gcc/tree-scalar-evolution.c
+++ b/gcc/tree-scalar-evolution.c
@@ -285,6 +285,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-into-ssa.h"
 #include "builtins.h"
 #include "case-cfn-macros.h"
+#include "predict.h"
 
 static tree analyze_scalar_evolution_1 (struct loop *, tree);
 static tree analyze_scalar_evolution_for_address_of (struct loop *loop,
@@ -3472,10 +3473,10 @@ scev_finalize (void)
 }
 
 /* Returns true if the expression EXPR is considered to be too expensive
-   for scev_const_prop.  */
+   for scev_const_prop.  If SIZE_P is true, it it optimized for size.  */
 
 bool
-expression_expensive_p (tree expr)
+expression_expensive_p (tree expr, bool size_p)
 {
   enum tree_code code;
 
@@ -3509,8 +3510,9 @@ expression_expensive_p (tree expr)
 	{
 	CASE_CFN_POPCOUNT:
 	  /* Check if opcode for popcount is available.  */
-	  if (optab_handler (popcount_optab,
-			     TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0))))
+	  if (!size_p
+	      && optab_handler (popcount_optab,
+				TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0))))
 	      == CODE_FOR_nothing)
 	    return true;
 	default:
@@ -3520,13 +3522,13 @@ expression_expensive_p (tree expr)
       if (!is_inexpensive_builtin (get_callee_fndecl (expr)))
 	return true;
       FOR_EACH_CALL_EXPR_ARG (arg, iter, expr)
-	if (expression_expensive_p (arg))
+	if (expression_expensive_p (arg, size_p))
 	  return true;
       return false;
     }
 
   if (code == COND_EXPR)
-    return (expression_expensive_p (TREE_OPERAND (expr, 0))
+    return (expression_expensive_p (TREE_OPERAND (expr, 0), size_p)
 	    || (EXPR_P (TREE_OPERAND (expr, 1))
 		&& EXPR_P (TREE_OPERAND (expr, 2)))
 	    /* If either branch has side effects or could trap.  */
@@ -3534,19 +3536,19 @@ expression_expensive_p (tree expr)
 	    || generic_expr_could_trap_p (TREE_OPERAND (expr, 1))
 	    || TREE_SIDE_EFFECTS (TREE_OPERAND (expr, 0))
 	    || generic_expr_could_trap_p (TREE_OPERAND (expr, 0))
-	    || expression_expensive_p (TREE_OPERAND (expr, 1))
-	    || expression_expensive_p (TREE_OPERAND (expr, 2)));
+	    || expression_expensive_p (TREE_OPERAND (expr, 1), size_p)
+	    || expression_expensive_p (TREE_OPERAND (expr, 2), size_p));
 
   switch (TREE_CODE_CLASS (code))
     {
     case tcc_binary:
     case tcc_comparison:
-      if (expression_expensive_p (TREE_OPERAND (expr, 1)))
+      if (expression_expensive_p (TREE_OPERAND (expr, 1), size_p))
 	return true;
 
       /* Fallthru.  */
     case tcc_unary:
-      return expression_expensive_p (TREE_OPERAND (expr, 0));
+      return expression_expensive_p (TREE_OPERAND (expr, 0), size_p);
 
     default:
       return true;
@@ -3615,7 +3617,7 @@ final_value_replacement_loop (struct loop *loop)
 
 	     he probably knows that n is not large, and does not want it
 	     to be turned into n %= 45.  */
-	  || expression_expensive_p (def))
+	  || expression_expensive_p (def, optimize_loop_for_size_p (loop)))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    {
diff --git a/gcc/tree-scalar-evolution.h b/gcc/tree-scalar-evolution.h
index f58ce66..1f19736 100644
--- a/gcc/tree-scalar-evolution.h
+++ b/gcc/tree-scalar-evolution.h
@@ -35,7 +35,7 @@ extern tree resolve_mixers (struct loop *, tree, bool *);
 extern void gather_stats_on_scev_database (void);
 extern void final_value_replacement_loop (struct loop *);
 extern unsigned int scev_const_prop (void);
-extern bool expression_expensive_p (tree);
+extern bool expression_expensive_p (tree, bool);
 extern bool simple_iv_with_niters (struct loop *, struct loop *, tree,
 				   struct affine_iv *, tree *, bool);
 extern bool simple_iv (struct loop *, struct loop *, tree, struct affine_iv *,
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index b42f32d..038eabd 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -5149,8 +5149,12 @@ may_eliminate_iv (struct ivopts_data *data,
   *comp = iv_elimination_compare (data, use);
 
   /* It is unlikely that computing the number of iterations using division
-     would be more profitable than keeping the original induction variable.  */
-  if (expression_expensive_p (*bound))
+     would be more profitable than keeping the original induction variable.
+
+     When replacing a loop, second argument to expression_expensive_p is used
+     in deciding if size should be considered.  In this case since this is
+     only concerning IV, we pass false.  */
+  if (expression_expensive_p (*bound, false))
     return false;
 
   /* Sometimes, it is possible to handle the situation that the number of
-- 
2.7.4

Reply via email to