> -----Original Message----- > From: Richard Sandiford [mailto:richard.sandif...@arm.com] > Sent: Tuesday, July 7, 2020 10:08 PM > To: xiezhiheng <xiezhih...@huawei.com> > Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org > Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions > emitted at -O3 > > xiezhiheng <xiezhih...@huawei.com> writes: > >> -----Original Message----- > >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] > >> Sent: Monday, July 6, 2020 5:31 PM > >> To: xiezhiheng <xiezhih...@huawei.com> > >> Cc: Richard Biener <richard.guent...@gmail.com>; > gcc-patches@gcc.gnu.org > >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions > >> emitted at -O3 > >> > >> No, this is unfortunately a known bug. See: > >> > >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964 > >> > >> (Although the PR is recent, it's been a known bug for longer.) > >> > >> As you say, the difficulty is that the correct attributes depend on what > >> the built-in function does. Most integer arithmetic is “const”, but > things > >> get more complicated for floating-point arithmetic. > >> > >> The SVE intrinsics use a three stage process: > >> > >> - each function is classified into one of several groups > >> - each group has a set of flags that describe what functions in the > >> group can do > >> - these flags get converted into attributes based on the current > >> command-line options > >> > >> I guess we should have something similar for the arm_neon.h built-ins. > >> > >> If you're willing to help fix this, that'd be great. I think a first > >> step would be to agree a design. > >> > >> Thanks, > >> Richard > > > > I'd like to have a try. > > Great! > > > I have checked the steps in SVE intrinsics. > > It defines a base class "function_base" and derives different classes > > to describe several intrinsics for each. And each class may > > have its own unique flags described in virtual function "call_properties". > > The specific attributes will be converted from these flags in > > "get_attributes" later. > > > > I find that there are more than 100 classes in total and if I only > > need to classify them into different groups by attributes, maybe > > we does not need so many classes? > > Yeah, I agree. > > Long term, there might be value in defining arm_neon.h in a similar > way to arm_sve.h: i.e. have arm_neon.h defer most of the work to > a special compiler pragma. But that's going to be a lot of work. > > I think it's possible to make incremental improvements to the current > arm_neon.h implementation without that work being thrown away if we > ever > did switch to a pragma in future. And the incremental approach seems > more practical. > > > The difficult thing I think is how to classify neon intrinsics into > > different groups. I'm going to follow up the way in SVE intrinsics > > first now. > > For now I'd suggest just giving a name to each combination of flags > that the intrinsics need, rather than splitting instructions in a > more fine-grained way. (It's not at all obvious from the final state > of the SVE code, but even there, the idea was to have as few groups as > possible. I.e. the groups were supposedly only split where necessary. > As you say, there still ended up being a lot of groups in the end…) > > It'd be easier to review if the work was split up into smaller steps. > E.g. maybe one way would be this, with each number being a single > patch: > > (1) (a) Add a flags field to the built-in function definitions > that for now is always zero. > (b) Pick a name N to describe the most conservative set of flags. > (c) Make every built-in function definition use N. >
I have finished the first part. (a) I add a new parameter called FLAG to every built-in function macro. (b) I define some flags in aarch64-builtins.c FLAG_NONE for no needed flags FLAG_READ_FPCR for functions will read FPCR register FLAG_RAISE_FP_EXCEPTIONS for functions will raise fp exceptions FLAG_READ_MEMORY for functions will read global memory FLAG_PREFETCH_MEMORY for functions will prefetch data to memory FLAG_WRITE_MEMORY for functions will write global memory FLAG_FP is used for floating-point arithmetic FLAG_ALL is all flags above (c) I add a field in struct aarch64_simd_builtin_datum to record flags for each built-in function. But the default flags I set for built-in functions are FLAG_ALL because by default the built-in functions might do anything. And bootstrap and regression are tested ok on aarch64 Linux platform. Any suggestions? Thanks, Xie Zhiheng > (2) (a) Pick one type of function that cannot yet be described properly. > (b) Pick a name N for that type of function. > (c) Add whichever new flags are needed. > (d) Add the appropriate attributes when the flags are set, > possibly based on command-line options. > (e) Make (exactly) one built-in function definition use N. > > (3) (a) Pick some functions that all need the same attributes and > that can already be described properly > (b) Update all of their built-in function definitions accordingly, > as a single change. > > So after (1), filling out the table is an iterative process of (2) and > (3), in any order that's convenient (although it might help to order the > (2) patches so that each one adds as few flags as possible). Each patch > would then be fairly small and self-contained. > > That's just a suggestion though. Please let me know if you have > any other suggestions. > > I guess there are two obvious ways of adding the flags field: > > - add a new parameter to every built-in function macro, e.g. > BUILTIN_VSDQ_I and VAR1. > > - wrap the definitions in a new macro, e.g. > MY_NEW_GROUP (BUILTIN_VSDQ_I (BINOP, sqshl, 0)) > > I don't really have a preference, and I guess all other things being > equal, the first one wins by being more obvious than the second. > Just thought I'd mention the second way in case anyone preferred it. > > Thanks, > Richard
pr94442-v1.patch
Description: pr94442-v1.patch