> -----Original Message-----
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, July 16, 2020 8:42 PM
> To: xiezhiheng <xiezhih...@huawei.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 
> xiezhiheng <xiezhih...@huawei.com> writes:
> >> -----Original Message-----
> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> >> Sent: Tuesday, July 7, 2020 10:08 PM
> >> To: xiezhiheng <xiezhih...@huawei.com>
> >> Cc: Richard Biener <richard.guent...@gmail.com>;
> gcc-patches@gcc.gnu.org
> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> >> emitted at -O3
> >>
> >> xiezhiheng <xiezhih...@huawei.com> writes:
> >> >> -----Original Message-----
> >> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> >> >> Sent: Monday, July 6, 2020 5:31 PM
> >> >> To: xiezhiheng <xiezhih...@huawei.com>
> >> >> Cc: Richard Biener <richard.guent...@gmail.com>;
> >> gcc-patches@gcc.gnu.org
> >> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp
> instructions
> >> >> emitted at -O3
> >> >>
> >> >> No, this is unfortunately a known bug.  See:
> >> >>
> >> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964
> >> >>
> >> >> (Although the PR is recent, it's been a known bug for longer.)
> >> >>
> >> >> As you say, the difficulty is that the correct attributes depend on what
> >> >> the built-in function does.  Most integer arithmetic is “const”, but
> >> things
> >> >> get more complicated for floating-point arithmetic.
> >> >>
> >> >> The SVE intrinsics use a three stage process:
> >> >>
> >> >> - each function is classified into one of several groups
> >> >> - each group has a set of flags that describe what functions in the
> >> >>   group can do
> >> >> - these flags get converted into attributes based on the current
> >> >>   command-line options
> >> >>
> >> >> I guess we should have something similar for the arm_neon.h built-ins.
> >> >>
> >> >> If you're willing to help fix this, that'd be great.  I think a first
> >> >> step would be to agree a design.
> >> >>
> >> >> Thanks,
> >> >> Richard
> >> >
> >> > I'd like to have a try.
> >>
> >> Great!
> >>
> >> > I have checked the steps in SVE intrinsics.
> >> > It defines a base class "function_base" and derives different classes
> >> > to describe several intrinsics for each.  And each class may
> >> > have its own unique flags described in virtual function 
> >> > "call_properties".
> >> > The specific attributes will be converted from these flags in
> >> > "get_attributes" later.
> >> >
> >> > I find that there are more than 100 classes in total and if I only
> >> > need to classify them into different groups by attributes, maybe
> >> > we does not need so many classes?
> >>
> >> Yeah, I agree.
> >>
> >> Long term, there might be value in defining arm_neon.h in a similar
> >> way to arm_sve.h: i.e. have arm_neon.h defer most of the work to
> >> a special compiler pragma.  But that's going to be a lot of work.
> >>
> >> I think it's possible to make incremental improvements to the current
> >> arm_neon.h implementation without that work being thrown away if we
> >> ever
> >> did switch to a pragma in future.  And the incremental approach seems
> >> more practical.
> >>
> >> > The difficult thing I think is how to classify neon intrinsics into
> >> > different groups.  I'm going to follow up the way in SVE intrinsics
> >> > first now.
> >>
> >> For now I'd suggest just giving a name to each combination of flags
> >> that the intrinsics need, rather than splitting instructions in a
> >> more fine-grained way.  (It's not at all obvious from the final state
> >> of the SVE code, but even there, the idea was to have as few groups as
> >> possible.  I.e. the groups were supposedly only split where necessary.
> >> As you say, there still ended up being a lot of groups in the end…)
> >>
> >> It'd be easier to review if the work was split up into smaller steps.
> >> E.g. maybe one way would be this, with each number being a single
> >> patch:
> >>
> >> (1) (a) Add a flags field to the built-in function definitions
> >>         that for now is always zero.
> >>     (b) Pick a name N to describe the most conservative set of flags.
> >>     (c) Make every built-in function definition use N.
> >>
> >
> > I have finished the first part.
> >
> > (a) I add a new parameter called FLAG to every built-in function macro.
> >
> > (b) I define some flags in aarch64-builtins.c
> > FLAG_NONE for no needed flags
> > FLAG_READ_FPCR for functions will read FPCR register
> > FLAG_RAISE_FP_EXCEPTIONS for functions will raise fp exceptions
> > FLAG_READ_MEMORY for functions will read global memory
> > FLAG_PREFETCH_MEMORY for functions will prefetch data to memory
> > FLAG_WRITE_MEMORY for functions will write global memory
> >
> > FLAG_FP is used for floating-point arithmetic
> > FLAG_ALL is all flags above
> >
> > (c) I add a field in struct aarch64_simd_builtin_datum to record flags
> > for each built-in function.  But the default flags I set for built-in 
> > functions
> > are FLAG_ALL because by default the built-in functions might do anything.
> >
> > And bootstrap and regression are tested ok on aarch64 Linux platform.
> 
> This looks great.
> 
> The patch is OK for trunk, but could you send a changelog too,
> so that I can include it in the commit message?
> 
> Thanks,
> Richard

OK, and I add the git commit msg in patch.

Thanks,
XieZhiheng

+2020-07-16  Zhiheng Xie  <xiezhih...@huawei.com>
+
+       PR tree-optimization/94442
+       * config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers):
+       Add new field flags.
+       (VAR1): Add new field FLAG in macro.
+       (VAR2): Likewise.
+       (VAR3): Likewise.
+       (VAR4): Likewise.
+       (VAR5): Likewise.
+       (VAR6): Likewise.
+       (VAR7): Likewise.
+       (VAR8): Likewise.
+       (VAR9): Likewise.
+       (VAR10): Likewise.
+       (VAR11): Likewise.
+       (VAR12): Likewise.
+       (VAR13): Likewise.
+       (VAR14): Likewise.
+       (VAR15): Likewise.
+       (VAR16): Likewise.
+       (aarch64_general_fold_builtin): Likewise.
+       (aarch64_general_gimple_fold_builtin): Likewise.
+       * config/aarch64/aarch64-simd-builtins.def: Add default flag for
+       each built-in function.
+       * config/aarch64/geniterators.sh: Add new field in BUILTIN macro.
+

Attachment: pr94442-v2.patch
Description: pr94442-v2.patch

Reply via email to