xiezhiheng <xiezhih...@huawei.com> writes: >> -----Original Message----- >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] >> Sent: Thursday, July 16, 2020 8:42 PM >> To: xiezhiheng <xiezhih...@huawei.com> >> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions >> emitted at -O3 >> >> xiezhiheng <xiezhih...@huawei.com> writes: >> >> -----Original Message----- >> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] >> >> Sent: Tuesday, July 7, 2020 10:08 PM >> >> To: xiezhiheng <xiezhih...@huawei.com> >> >> Cc: Richard Biener <richard.guent...@gmail.com>; >> gcc-patches@gcc.gnu.org >> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions >> >> emitted at -O3 >> >> >> >> xiezhiheng <xiezhih...@huawei.com> writes: >> >> >> -----Original Message----- >> >> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com] >> >> >> Sent: Monday, July 6, 2020 5:31 PM >> >> >> To: xiezhiheng <xiezhih...@huawei.com> >> >> >> Cc: Richard Biener <richard.guent...@gmail.com>; >> >> gcc-patches@gcc.gnu.org >> >> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp >> instructions >> >> >> emitted at -O3 >> >> >> >> >> >> No, this is unfortunately a known bug. See: >> >> >> >> >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964 >> >> >> >> >> >> (Although the PR is recent, it's been a known bug for longer.) >> >> >> >> >> >> As you say, the difficulty is that the correct attributes depend on >> >> >> what >> >> >> the built-in function does. Most integer arithmetic is “const”, but >> >> things >> >> >> get more complicated for floating-point arithmetic. >> >> >> >> >> >> The SVE intrinsics use a three stage process: >> >> >> >> >> >> - each function is classified into one of several groups >> >> >> - each group has a set of flags that describe what functions in the >> >> >> group can do >> >> >> - these flags get converted into attributes based on the current >> >> >> command-line options >> >> >> >> >> >> I guess we should have something similar for the arm_neon.h built-ins. >> >> >> >> >> >> If you're willing to help fix this, that'd be great. I think a first >> >> >> step would be to agree a design. >> >> >> >> >> >> Thanks, >> >> >> Richard >> >> > >> >> > I'd like to have a try. >> >> >> >> Great! >> >> >> >> > I have checked the steps in SVE intrinsics. >> >> > It defines a base class "function_base" and derives different classes >> >> > to describe several intrinsics for each. And each class may >> >> > have its own unique flags described in virtual function >> >> > "call_properties". >> >> > The specific attributes will be converted from these flags in >> >> > "get_attributes" later. >> >> > >> >> > I find that there are more than 100 classes in total and if I only >> >> > need to classify them into different groups by attributes, maybe >> >> > we does not need so many classes? >> >> >> >> Yeah, I agree. >> >> >> >> Long term, there might be value in defining arm_neon.h in a similar >> >> way to arm_sve.h: i.e. have arm_neon.h defer most of the work to >> >> a special compiler pragma. But that's going to be a lot of work. >> >> >> >> I think it's possible to make incremental improvements to the current >> >> arm_neon.h implementation without that work being thrown away if we >> >> ever >> >> did switch to a pragma in future. And the incremental approach seems >> >> more practical. >> >> >> >> > The difficult thing I think is how to classify neon intrinsics into >> >> > different groups. I'm going to follow up the way in SVE intrinsics >> >> > first now. >> >> >> >> For now I'd suggest just giving a name to each combination of flags >> >> that the intrinsics need, rather than splitting instructions in a >> >> more fine-grained way. (It's not at all obvious from the final state >> >> of the SVE code, but even there, the idea was to have as few groups as >> >> possible. I.e. the groups were supposedly only split where necessary. >> >> As you say, there still ended up being a lot of groups in the end…) >> >> >> >> It'd be easier to review if the work was split up into smaller steps. >> >> E.g. maybe one way would be this, with each number being a single >> >> patch: >> >> >> >> (1) (a) Add a flags field to the built-in function definitions >> >> that for now is always zero. >> >> (b) Pick a name N to describe the most conservative set of flags. >> >> (c) Make every built-in function definition use N. >> >> >> > >> > I have finished the first part. >> > >> > (a) I add a new parameter called FLAG to every built-in function macro. >> > >> > (b) I define some flags in aarch64-builtins.c >> > FLAG_NONE for no needed flags >> > FLAG_READ_FPCR for functions will read FPCR register >> > FLAG_RAISE_FP_EXCEPTIONS for functions will raise fp exceptions >> > FLAG_READ_MEMORY for functions will read global memory >> > FLAG_PREFETCH_MEMORY for functions will prefetch data to memory >> > FLAG_WRITE_MEMORY for functions will write global memory >> > >> > FLAG_FP is used for floating-point arithmetic >> > FLAG_ALL is all flags above >> > >> > (c) I add a field in struct aarch64_simd_builtin_datum to record flags >> > for each built-in function. But the default flags I set for built-in >> > functions >> > are FLAG_ALL because by default the built-in functions might do anything. >> > >> > And bootstrap and regression are tested ok on aarch64 Linux platform. >> >> This looks great. >> >> The patch is OK for trunk, but could you send a changelog too, >> so that I can include it in the commit message? >> >> Thanks, >> Richard > > OK, and I add the git commit msg in patch.
Thanks, pushed to master. Richard