Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it

Richard Biener Fri, 26 Oct 2018 01:44:18 -0700

On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <l...@redhat.com> wrote:
>
> On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote:
> > Hi,
> >
> > PR87528 showed a case where libgcc generated popcount is causing
> > regression for Skylake.
> > We also have PR86677 where kernel build is failing because the kernel
> > does not use the libgcc (when backend is not defining popcount
> > pattern).  While I agree that the kernel should implement its own
> > functionality when it is not using the libgcc, I am afraid that the
> > implementation can have the same performance issues reported for
> > Skylake in PR87528.
> >
> > Therefore, I would like to propose that we disable popcount detection
> > when we don't have a pattern for that. The attached patch (based on
> > previous discussions) does this.
> >
> > Bootstrapped and regression tested on x86_64-linux-gnu with no new
> > regressions. We need to disable the popcount* testcases. I will have
> > to define a effective_target_with_popcount in
> > gcc/testsuite/lib/target-supports.exp if this patch is OK?
> > Thanks,
> > Kugan
> >
> >
> > gcc/ChangeLog:
> >
> > 2018-10-25  Kugan Vivekanandarajah  <kug...@linaro.org>
> >
> >     * tree-scalar-evolution.c (expression_expensive_p): Make BUILTIN 
> > POPCOUNT
> >     as expensive when backend does not define it.
> >
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2018-10-25  Kugan Vivekanandarajah  <kug...@linaro.org>
> >
> >     * gcc.target/aarch64/popcount4.c: New test.
> >
> FWIW, I've been disabling by checking direct_optab_handler elsewhere
> (number_of_iterations_popcount) in my tester.  It may in fact be an old
> patch from you.
>
> Richi argued that it's the kernel team's responsibility to provide a
> popcount since they don't link with libgcc.  And I'm generally in
> agreement with that position, though it does tend to generate some
> friction with the kernel developers.  We also run the real risk of GCC 9
> not being able to build the kernel which, IMHO, would be a disaster from
> a PR standpoint.
>
> I'd like to hear from others here.  I fully realize we're beyond the
> realm of what is strictly technically correct here from a review standpoint.


As said final value replacement to a library call is probably not wanted
for optimization purpose, so adjusting expression_expensive_p is OK with
me.  It might not fully solve the (non-)issue in case another optimization pass
chooses to materialize niter computation result.

Few comments on the patch:

+      tree fndecl = get_callee_fndecl (expr);
+
+      if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+       {
+         combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE (fndecl));

  combined_fn cfn = gimple_call_combined_fn (expr);
  switch (cfn)
    {
...

cfn will be CFN_LAST for a non-builtin/internal call.  I know Richard is mostly
offline but eventually he knows whether there is a better way to query

+           CASE_CFN_POPCOUNT:
+             /* Check if opcode for popcount is available.  */
+             if (optab_handler (popcount_optab,
+                                TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG
(expr, 0))))
+                 == CODE_FOR_nothing)
+               return true;

note that we currently generate builtin calls rather than IFN calls
(when a direct
optab is supported).

Another comment on the patch is that you probably have to adjust existing
popcount testcases to add architecture specific flags enabling suport for
the instructions, otherwise you won't see loop replacement.

Also I think that the expression is only expensive (for final value
replacement!)
if you consider optimizing for speed.  When optimizing for size getting rid of
the loop is probably beneificial unconditionally.  That would leave the
possibility to switch said testcases to -Os.  It would require adding a
bool size_p flag to expression_expensive and passing down
optimize_loop_for_size_p ().

_NOTE_ that expression_expensive_p is also used by IVOPTs and there
replacing sth with an expression based on the niter analysis result doesn't
mean we get rid of the loop (but only of an IV), so maybe that reasoning
doesn't apply there.

Richard.

> Jeff
>

Re: [RFC][PR87528][PR86677] Disable builtin popcount detection when back-end does not define it

Reply via email to