xiezhiheng <xiezhih...@huawei.com> writes:
>> -----Original Message-----
>> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> Sent: Thursday, July 16, 2020 8:42 PM
>> To: xiezhiheng <xiezhih...@huawei.com>
>> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org
>> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> emitted at -O3
>> 
>> xiezhiheng <xiezhih...@huawei.com> writes:
>> >> -----Original Message-----
>> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> >> Sent: Tuesday, July 7, 2020 10:08 PM
>> >> To: xiezhiheng <xiezhih...@huawei.com>
>> >> Cc: Richard Biener <richard.guent...@gmail.com>;
>> gcc-patches@gcc.gnu.org
>> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
>> >> emitted at -O3
>> >>
>> >> xiezhiheng <xiezhih...@huawei.com> writes:
>> >> >> -----Original Message-----
>> >> >> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
>> >> >> Sent: Monday, July 6, 2020 5:31 PM
>> >> >> To: xiezhiheng <xiezhih...@huawei.com>
>> >> >> Cc: Richard Biener <richard.guent...@gmail.com>;
>> >> gcc-patches@gcc.gnu.org
>> >> >> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp
>> instructions
>> >> >> emitted at -O3
>> >> >>
>> >> >> No, this is unfortunately a known bug.  See:
>> >> >>
>> >> >>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95964
>> >> >>
>> >> >> (Although the PR is recent, it's been a known bug for longer.)
>> >> >>
>> >> >> As you say, the difficulty is that the correct attributes depend on 
>> >> >> what
>> >> >> the built-in function does.  Most integer arithmetic is “const”, but
>> >> things
>> >> >> get more complicated for floating-point arithmetic.
>> >> >>
>> >> >> The SVE intrinsics use a three stage process:
>> >> >>
>> >> >> - each function is classified into one of several groups
>> >> >> - each group has a set of flags that describe what functions in the
>> >> >>   group can do
>> >> >> - these flags get converted into attributes based on the current
>> >> >>   command-line options
>> >> >>
>> >> >> I guess we should have something similar for the arm_neon.h built-ins.
>> >> >>
>> >> >> If you're willing to help fix this, that'd be great.  I think a first
>> >> >> step would be to agree a design.
>> >> >>
>> >> >> Thanks,
>> >> >> Richard
>> >> >
>> >> > I'd like to have a try.
>> >>
>> >> Great!
>> >>
>> >> > I have checked the steps in SVE intrinsics.
>> >> > It defines a base class "function_base" and derives different classes
>> >> > to describe several intrinsics for each.  And each class may
>> >> > have its own unique flags described in virtual function 
>> >> > "call_properties".
>> >> > The specific attributes will be converted from these flags in
>> >> > "get_attributes" later.
>> >> >
>> >> > I find that there are more than 100 classes in total and if I only
>> >> > need to classify them into different groups by attributes, maybe
>> >> > we does not need so many classes?
>> >>
>> >> Yeah, I agree.
>> >>
>> >> Long term, there might be value in defining arm_neon.h in a similar
>> >> way to arm_sve.h: i.e. have arm_neon.h defer most of the work to
>> >> a special compiler pragma.  But that's going to be a lot of work.
>> >>
>> >> I think it's possible to make incremental improvements to the current
>> >> arm_neon.h implementation without that work being thrown away if we
>> >> ever
>> >> did switch to a pragma in future.  And the incremental approach seems
>> >> more practical.
>> >>
>> >> > The difficult thing I think is how to classify neon intrinsics into
>> >> > different groups.  I'm going to follow up the way in SVE intrinsics
>> >> > first now.
>> >>
>> >> For now I'd suggest just giving a name to each combination of flags
>> >> that the intrinsics need, rather than splitting instructions in a
>> >> more fine-grained way.  (It's not at all obvious from the final state
>> >> of the SVE code, but even there, the idea was to have as few groups as
>> >> possible.  I.e. the groups were supposedly only split where necessary.
>> >> As you say, there still ended up being a lot of groups in the end…)
>> >>
>> >> It'd be easier to review if the work was split up into smaller steps.
>> >> E.g. maybe one way would be this, with each number being a single
>> >> patch:
>> >>
>> >> (1) (a) Add a flags field to the built-in function definitions
>> >>         that for now is always zero.
>> >>     (b) Pick a name N to describe the most conservative set of flags.
>> >>     (c) Make every built-in function definition use N.
>> >>
>> >
>> > I have finished the first part.
>> >
>> > (a) I add a new parameter called FLAG to every built-in function macro.
>> >
>> > (b) I define some flags in aarch64-builtins.c
>> > FLAG_NONE for no needed flags
>> > FLAG_READ_FPCR for functions will read FPCR register
>> > FLAG_RAISE_FP_EXCEPTIONS for functions will raise fp exceptions
>> > FLAG_READ_MEMORY for functions will read global memory
>> > FLAG_PREFETCH_MEMORY for functions will prefetch data to memory
>> > FLAG_WRITE_MEMORY for functions will write global memory
>> >
>> > FLAG_FP is used for floating-point arithmetic
>> > FLAG_ALL is all flags above
>> >
>> > (c) I add a field in struct aarch64_simd_builtin_datum to record flags
>> > for each built-in function.  But the default flags I set for built-in 
>> > functions
>> > are FLAG_ALL because by default the built-in functions might do anything.
>> >
>> > And bootstrap and regression are tested ok on aarch64 Linux platform.
>> 
>> This looks great.
>> 
>> The patch is OK for trunk, but could you send a changelog too,
>> so that I can include it in the commit message?
>> 
>> Thanks,
>> Richard
>
> OK, and I add the git commit msg in patch.

Thanks, pushed to master.

Richard

Reply via email to