On Fri, Nov 2, 2018 at 10:02 AM Kugan Vivekanandarajah <kugan.vivekanandara...@linaro.org> wrote: > > Hi Richard, > Thanks for the review. > On Tue, 30 Oct 2018 at 01:25, Richard Biener <richard.guent...@gmail.com> > wrote: > > > > On Mon, Oct 29, 2018 at 2:06 AM Kugan Vivekanandarajah > > <kugan.vivekanandara...@linaro.org> wrote: > > > > > > Hi Richard and Jeff, > > > > > > Thanks for your comments. > > > > > > On Fri, 26 Oct 2018 at 19:40, Richard Biener <richard.guent...@gmail.com> > > > wrote: > > > > > > > > On Fri, Oct 26, 2018 at 4:55 AM Jeff Law <l...@redhat.com> wrote: > > > > > > > > > > On 10/25/18 4:33 PM, Kugan Vivekanandarajah wrote: > > > > > > Hi, > > > > > > > > > > > > PR87528 showed a case where libgcc generated popcount is causing > > > > > > regression for Skylake. > > > > > > We also have PR86677 where kernel build is failing because the > > > > > > kernel > > > > > > does not use the libgcc (when backend is not defining popcount > > > > > > pattern). While I agree that the kernel should implement its own > > > > > > functionality when it is not using the libgcc, I am afraid that the > > > > > > implementation can have the same performance issues reported for > > > > > > Skylake in PR87528. > > > > > > > > > > > > Therefore, I would like to propose that we disable popcount > > > > > > detection > > > > > > when we don't have a pattern for that. The attached patch (based on > > > > > > previous discussions) does this. > > > > > > > > > > > > Bootstrapped and regression tested on x86_64-linux-gnu with no new > > > > > > regressions. We need to disable the popcount* testcases. I will have > > > > > > to define a effective_target_with_popcount in > > > > > > gcc/testsuite/lib/target-supports.exp if this patch is OK? > > > > > > Thanks, > > > > > > Kugan > > > > > > > > > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > > > 2018-10-25 Kugan Vivekanandarajah <kug...@linaro.org> > > > > > > > > > > > > * tree-scalar-evolution.c (expression_expensive_p): Make > > > > > > BUILTIN POPCOUNT > > > > > > as expensive when backend does not define it. > > > > > > > > > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > > > 2018-10-25 Kugan Vivekanandarajah <kug...@linaro.org> > > > > > > > > > > > > * gcc.target/aarch64/popcount4.c: New test. > > > > > > > > > > > FWIW, I've been disabling by checking direct_optab_handler elsewhere > > > > > (number_of_iterations_popcount) in my tester. It may in fact be an > > > > > old > > > > > patch from you. > > > > > > > > > > Richi argued that it's the kernel team's responsibility to provide a > > > > > popcount since they don't link with libgcc. And I'm generally in > > > > > agreement with that position, though it does tend to generate some > > > > > friction with the kernel developers. We also run the real risk of > > > > > GCC 9 > > > > > not being able to build the kernel which, IMHO, would be a disaster > > > > > from > > > > > a PR standpoint. > > > > > > > > > > I'd like to hear from others here. I fully realize we're beyond the > > > > > realm of what is strictly technically correct here from a review > > > > > standpoint. > > > > > > > > As said final value replacement to a library call is probably not wanted > > > > for optimization purpose, so adjusting expression_expensive_p is OK with > > > > me. It might not fully solve the (non-)issue in case another > > > > optimization pass > > > > chooses to materialize niter computation result. > > > > > > > > Few comments on the patch: > > > > > > > > + tree fndecl = get_callee_fndecl (expr); > > > > + > > > > + if (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL) > > > > + { > > > > + combined_fn cfn = as_combined_fn (DECL_FUNCTION_CODE > > > > (fndecl)); > > > > > > > > combined_fn cfn = gimple_call_combined_fn (expr); > > > > switch (cfn) > > > > { > > > > > > Did you mean: > > > combined_fn cfn = get_call_combined_fn (expr); > > > > Yes. > > > > > > ... > > > > > > > > cfn will be CFN_LAST for a non-builtin/internal call. I know Richard > > > > is mostly > > > > offline but eventually he knows whether there is a better way to query > > > > > > > > + CASE_CFN_POPCOUNT: > > > > + /* Check if opcode for popcount is available. */ > > > > + if (optab_handler (popcount_optab, > > > > + TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG > > > > (expr, 0)))) > > > > + == CODE_FOR_nothing) > > > > + return true; > > > > > > > > note that we currently generate builtin calls rather than IFN calls > > > > (when a direct > > > > optab is supported). > > > > > > > > Another comment on the patch is that you probably have to adjust > > > > existing > > > > popcount testcases to add architecture specific flags enabling suport > > > > for > > > > the instructions, otherwise you won't see loop replacement. > > > Indeed. > > > In lib/target-supports.exp, I will try to add support for > > > check_effective_target_popcount_long. > > > When I grep for the popcount pattern in md files, I see it is defined for: > > > > > > tilegx > > > tilepro > > > alpha > > > aarch64 when TARGET_SIMD > > > ia64 > > > rs6000 > > > i386 when TARGET_POPCOUNT > > > popwerpcspce when TARGET_POPCNTB || TARGET_POPCNTD > > > s390 when TARGET_Z916 && TARGET_64BIT > > > sparc when TARGET_POPC > > > arm when TARGET_NEON > > > mips when ISA_HAS_POP > > > spu > > > avr > > > > > > I can check these targets with the condition. > > > Another possibility is to check with a sample code and see if we are > > > getting a libcall in the asm. Not sure if that is straightforward. Are > > > there any example for such. > > > > You could try linking w/o libgcc ... > I realized that there are some examples already and I have based it > on that. Attached patch > 0001-fix-kernel-build-v3.patch does this. Bootstrapped and regression > tested on aarch64-linux-gnu with no new regression. Is this OK?
I suspect using sth like the hard-float test is better, thus check_no_messages_and_pattern popcountl "!\\(call" rtl-expand instead of looking for !__popcount in assembly since depending on assembler syntax this might nor might not match spuriously... @@ -3501,6 +3504,19 @@ expression_expensive_p (tree expr) tree arg; call_expr_arg_iterator iter; + combined_fn cfn = get_call_combined_fn (expr); + switch (cfn) + { + CASE_CFN_POPCOUNT: + /* Check if opcode for popcount is available. */ + if (optab_handler (popcount_optab, + TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 0)))) + == CODE_FOR_nothing) + return true; + default: + break; + } + if (!is_inexpensive_builtin (get_callee_fndecl (expr))) return true; FOR_EACH_CALL_EXPR_ARG (arg, iter, expr) note that we can handle double-word popcount by emitting two single-word popcount instructions. So if the mode is of 2 * UNITS_PER_WORD size you want to check for mode == word_mode. See expand_unop. I think you want to add a comment before the code explaining that even though is_inexpensive_builtin says true we may get a library call which we consider expensive. > > > > > We could also move these test to a primary target that is tested often > > > tested which also defines popcount pattern. I dont think these tests > > > change for targets and if we can test in one target that could be > > > enough, > > > > > > I am happy to implement what is appropriate. > > > > How about the -Os idea? > Attached patch 0002-allow-builtin-popcount-for-size_p.patch attempts > this. As you mentioned, this will again break the kernel. While I am > trying to provide the popcount implementation for the newer kernel, it > will still be a problem for existing kernel versions. May I request > that we provide a flag to disable this if we decide to go with the > patch please? Let's consider this only for GCC 10 and get the first patch ready. expression_expensive_p shouldn't consider division as expensive for -Os either so there's more churn here and expression_expensive_p doesn't consider the overall size of the expression anyways. So let's drop this for now. Richard. > Thanks, > Kugan > > > > > Richard. > > > > > Thanks, > > > Kugan > > > > > > > > > > > > > > > > > Also I think that the expression is only expensive (for final value > > > > replacement!) > > > > if you consider optimizing for speed. When optimizing for size getting > > > > rid of > > > > the loop is probably beneificial unconditionally. That would leave the > > > > possibility to switch said testcases to -Os. It would require adding a > > > > bool size_p flag to expression_expensive and passing down > > > > optimize_loop_for_size_p (). > > > > > > > > _NOTE_ that expression_expensive_p is also used by IVOPTs and there > > > > replacing sth with an expression based on the niter analysis result > > > > doesn't > > > > mean we get rid of the loop (but only of an IV), so maybe that reasoning > > > > doesn't apply there. > > > > > > > > Richard. > > > > > > > > > Jeff > > > > >