Hi Segher,

Thanks a lot for your helpful comments!

Segher Boessenkool <seg...@kernel.crashing.org> writes:

> On Thu, Dec 08, 2022 at 09:17:38PM +0800, Jiufu Guo wrote:
>> Segher Boessenkool <seg...@kernel.crashing.org> writes:
>> > On Wed, Dec 07, 2022 at 08:00:08PM +0800, Jiufu Guo wrote:
>> >> typedef struct SA {double a[3];} A;
>> >> A ret_arg_pt (A *a) {return *a;} // on ppc64le, expect only 3 lfd(s)
>> >> A ret_arg (A a) {return a;} // just empty fun body
>> >> void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)
>> >
>> > What is this like if you use [5] instead?  Or use an ABI without
>> > homogeneous aggregates?
>> Thanks for this question!  I also tested the cases on different array
>> types or different sizes, or mixed field types.
>> 
>> If it is out of the number of registers for passing the param
>> or return, it is treated as a mem block.
>> For parameter, it is partially passed via registers, and partially
>> passing via stack.
>> For return, it is returned via a pointer (with one invisible pointer
>> parameter). And the <retval> of the function is not with parallel code.
>> 
>> This patch does not cover these cases.
>
> Understood, sure; but my point is, can it degrade code quality in such
> cases?  I don't see anything in the patch that precludes that.

No, the behavior of such cases is not affected in this patch.
The preclude code is in "assign_parm_setup_block". This patch only shows
the different parts, the context is not shown.

In assign_parm_setup_block, this patch marks "DECL_STACK_REGS_P" only
for "REG_P (entry_parm) || GET_CODE (entry_parm) == PARALLEL" which
indicates the registers are enough to pass the param.

>
>> >> --- /dev/null
>> >> +++ b/gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>> >> @@ -0,0 +1,15 @@
>> >> +/* PR target/65421 */
>> >> +/* { dg-options "-O2" } */
>> >> +/* { dg-require-effective-target has_arch_ppc64 } */
>> >> +
>> >> +typedef struct SA
>> >> +{
>> >> +  double a[2];
>> >> +  long l;
>> >> +} A;
>> >> +
>> >> +/* std 3 param regs to return slot */
>> >> +A ret_arg (A a) {return a;}
>> >> +/* { dg-final { scan-assembler-times {\mstd 4,0\(3\)\s} 1 } } */
>> >> +/* { dg-final { scan-assembler-times {\mstd 5,8\(3\)\s} 1 } } *
>> >> +/* { dg-final { scan-assembler-times {\mstd 6,16\(3\)\s} 1 } } */
>> >
>> > This is only correct on certain ABIs, probably only ELFv2 even.
>> Thanks for point out this!
>> This is only correct if the ABI allows this struct to be passed
>> through integer registers, and return through the mem block.
>
> And it needs to be in those specific registers / at those specific
> offsets as well.
Yes.
>
> Btw, please leave out the \s?
Thanks! 
>
>> In the previous version, I added a requirement on ELFv2. As tested on
>> BE environments, this case also pass. So, I deleted the requirement.
>
> BE for ELFv2 also exists, fwiw.
Yeap! We have -mabi=elfv2.
>
>> (While on BE environments, there is another issue: some unnecessary
>> memory stores are not deleted.)
>
> Huh.  Does that happen with the current compiler as well?  Do you have
> an example?

We can use the test case (pr65421-1.c) as the example -:)

typedef struct SA {double a[2]; long l; } A;
A ret_arg (A a) {return a;}

For this case, without the patch, below is generated:
        std 4,56(1)
        std 5,64(1)
        li 10,56
        std 6,72(1)
        std 6,16(3)
        lxvd2x 0,1,10
        stxvd2x 0,0,3
With the patch, below is generated:
        std 4,56(1)
        std 5,64(1)
        std 6,72(1)
        std 4,0(3)
        std 5,8(3)
        std 6,16(3)
The first 3 std insns are reductant.  This is an unrelated issue.
With -mabi=elfv2, code can be optimized, and those 3 insns are deleted.

I think it would be fine to just test this case on powerpc_elfv2.
I would merge pr65421-1.c into pr65421.c (with dg-require elfv2).

>
>> But with more reading of the code 'rs6000_function_arg', as you said,
>> I'm not sure if this behavior meets other ABIs (at least, it seems,
>> this is not correct on darwin64).
>> So, as you said, we may add a requirement on ELFv2; Or leave this
>> case there, and add "! target" when hitting failure?
>
> If you do !target the testcase won't test much at all anymore ;-)

Right. we could use this method to exclude the sub-targets which are not
using r4,5,6 for the param for this case.

>
>> > We certainly can improve the homogeneous aggregates stuff, but please
>> > make sure you don't degrade all other stuff?  Older, as well as when
>> > things are not an homogeneous aggregate, for example too big.  Can you
>> > please add tests for such cases?
>> Sure, thanks!  I encounter one issue in this kind of case (large struct)
>> on a previous version path.
>
> Perhaps it would be better to have a hook so that every target (and
> subtarget) can fine tune exactly when this is done.  Then again, perhaps
> I worry too much.

Understand, I also thought about using a hook for targets to tune!
The good news is that: a few target hooks are used by generic code for
arg&ret, and the target-related info (e.g. if a struct param is passed
via registers) are available (indirectly).
And I also feel that I may need to add a hook when the required target
info is not accessible(or not suitable) for generic code.

Thanks again for your review!

BR,
Jeff (Jiufu)
>
>
> Segher

Reply via email to