> -----Original Message----- > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > Sent: Thursday, July 16, 2020 8:42 PM > To: xiezhiheng <xiezhih...@huawei.com> > Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions > emitted at -O3 > > xiezhiheng <xiezhih...@huawei.com> writes: > >> -----Original Message----- > >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] > >> Sent: Tuesday, July 7, 2020 10:08 PM > >> To: xiezhiheng <xiezhih...@huawei.com> > >> Cc: Richard Biener <richard.guent...@gmail.com>; > gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions > >> emitted at -O3 > >> > >> xiezhiheng <xiezhih...@huawei.com> writes: > >> >> -----Original Message----- > >> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] > >> >> Sent: Monday, July 6, 2020 5:31 PM > >> >> To: xiezhiheng <xiezhih...@huawei.com> > >> >> Cc: Richard Biener <richard.guent...@gmail.com>; > >> gcc-patches@gcc.gnu.org > >> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp > instructions > >> >> emitted at -O3 > >> >> > >> >> No, this is unfortunately a known bug. See: > >> >> > >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964 > >> >> > >> >> (Although the PR is recent, it's been a known bug for longer.) > >> >> > >> >> As you say, the difficulty is that the correct attributes depend on what > >> >> the built-in function does. Most integer arithmetic is “const”, but > >> things > >> >> get more complicated for floating-point arithmetic. > >> >> > >> >> The SVE intrinsics use a three stage process: > >> >> > >> >> - each function is classified into one of several groups > >> >> - each group has a set of flags that describe what functions in the > >> >> group can do > >> >> - these flags get converted into attributes based on the current > >> >> command-line options > >> >> > >> >> I guess we should have something similar for the arm_neon.h built-ins. > >> >> > >> >> If you're willing to help fix this, that'd be great. I think a first > >> >> step would be to agree a design. > >> >> > >> >> Thanks, > >> >> Richard > >> > > >> > I'd like to have a try. > >> > >> Great! > >> > >> > I have checked the steps in SVE intrinsics. > >> > It defines a base class "function_base" and derives different classes > >> > to describe several intrinsics for each. And each class may > >> > have its own unique flags described in virtual function > >> > "call_properties". > >> > The specific attributes will be converted from these flags in > >> > "get_attributes" later. > >> > > >> > I find that there are more than 100 classes in total and if I only > >> > need to classify them into different groups by attributes, maybe > >> > we does not need so many classes? > >> > >> Yeah, I agree. > >> > >> Long term, there might be value in defining arm_neon.h in a similar > >> way to arm_sve.h: i.e. have arm_neon.h defer most of the work to > >> a special compiler pragma. But that's going to be a lot of work. > >> > >> I think it's possible to make incremental improvements to the current > >> arm_neon.h implementation without that work being thrown away if we > >> ever > >> did switch to a pragma in future. And the incremental approach seems > >> more practical. > >> > >> > The difficult thing I think is how to classify neon intrinsics into > >> > different groups. I'm going to follow up the way in SVE intrinsics > >> > first now. > >> > >> For now I'd suggest just giving a name to each combination of flags > >> that the intrinsics need, rather than splitting instructions in a > >> more fine-grained way. (It's not at all obvious from the final state > >> of the SVE code, but even there, the idea was to have as few groups as > >> possible. I.e. the groups were supposedly only split where necessary. > >> As you say, there still ended up being a lot of groups in the end…) > >> > >> It'd be easier to review if the work was split up into smaller steps. > >> E.g. maybe one way would be this, with each number being a single > >> patch: > >> > >> (1) (a) Add a flags field to the built-in function definitions > >> that for now is always zero. > >> (b) Pick a name N to describe the most conservative set of flags. > >> (c) Make every built-in function definition use N. > >> > > > > I have finished the first part. > > > > (a) I add a new parameter called FLAG to every built-in function macro. > > > > (b) I define some flags in aarch64-builtins.c > > FLAG_NONE for no needed flags > > FLAG_READ_FPCR for functions will read FPCR register > > FLAG_RAISE_FP_EXCEPTIONS for functions will raise fp exceptions > > FLAG_READ_MEMORY for functions will read global memory > > FLAG_PREFETCH_MEMORY for functions will prefetch data to memory > > FLAG_WRITE_MEMORY for functions will write global memory > > > > FLAG_FP is used for floating-point arithmetic > > FLAG_ALL is all flags above > > > > (c) I add a field in struct aarch64_simd_builtin_datum to record flags > > for each built-in function. But the default flags I set for built-in > > functions > > are FLAG_ALL because by default the built-in functions might do anything. > > > > And bootstrap and regression are tested ok on aarch64 Linux platform. > > This looks great. > > The patch is OK for trunk, but could you send a changelog too, > so that I can include it in the commit message? > > Thanks, > Richard
OK, and I add the git commit msg in patch. Thanks, XieZhiheng +2020-07-16 Zhiheng Xie <xiezhih...@huawei.com> + + PR tree-optimization/94442 + * config/aarch64/aarch64-builtins.c (enum aarch64_type_qualifiers): + Add new field flags. + (VAR1): Add new field FLAG in macro. + (VAR2): Likewise. + (VAR3): Likewise. + (VAR4): Likewise. + (VAR5): Likewise. + (VAR6): Likewise. + (VAR7): Likewise. + (VAR8): Likewise. + (VAR9): Likewise. + (VAR10): Likewise. + (VAR11): Likewise. + (VAR12): Likewise. + (VAR13): Likewise. + (VAR14): Likewise. + (VAR15): Likewise. + (VAR16): Likewise. + (aarch64_general_fold_builtin): Likewise. + (aarch64_general_gimple_fold_builtin): Likewise. + * config/aarch64/aarch64-simd-builtins.def: Add default flag for + each built-in function. + * config/aarch64/geniterators.sh: Add new field in BUILTIN macro. +
pr94442-v2.patch
Description: pr94442-v2.patch