Jiufu Guo via Gcc-patches <gcc-patches@gcc.gnu.org> writes:

> Hi Richard,
>
> Thanks a lot for your comments!
>
> Richard Biener <rguent...@suse.de> writes:
>
>> On Wed, 23 Nov 2022, Jiufu Guo wrote:
>>
>>> Hi Jeff,
>>> 
>>> Thanks a lot for your comments!
>>
>> Sorry for the late response ...
>>
>>> Jeff Law <jeffreya...@gmail.com> writes:
>>> 
>>> > On 11/20/22 20:07, Jiufu Guo wrote:
>>> >> Jiufu Guo <guoji...@linux.ibm.com> writes:
>>> >>
>>> >>> Hi,
>>> >>>
>>> >>> As mentioned in the previous version patch:
>>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>>> >>> The suboptimal code is generated for "assigning from parameter" or
>>> >>> "assigning to return value".
>>> >>> This patch enhances the assignment from parameters like the below
>>> >>> cases:
>>> >>> /////case1.c
cut...
>>> >> +      FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>>> >> +        if (greturn *ret = safe_dyn_cast<greturn *> (last_stmt 
>>> >> (e->src)))
>>> >> +          {
>>> >> +            tree val = gimple_return_retval (ret);
>>> >> +            if (val && VAR_P (val))
>>> >> +              DECL_USEDBY_RETURN_P (val) = 1;
>>
>> you probably want to check && auto_var_in_fn (val, ...) since val
>> might be global?
> Since we are collecting the return vals, it would be built in function
> gimplify_return_expr.  In this function, create_tmp_reg is used and
> a local temp.  So it would not be a global var here.
>
> code piece in gimplify_return_expr:
>   if (!result_decl)
>     result = NULL_TREE;
>   else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl)))
>     {
>     ....
>       result = result_decl;
>     }
>   else if (gimplify_ctxp->return_temp)
>     result = gimplify_ctxp->return_temp;
>   else
>     {
>       result = create_tmp_reg (TREE_TYPE (result_decl));
>
> Here, for "typedef struct SA {double a[3];}", aggregate_value_p returns
> false for target like ppc64le, because result of "hard_function_value"
> is a "rtx with PARALLELL code".
> And then a DECL_VAR is built via "create_tmp_reg" (actually it is not a
> reg here. it built a DECL_VAR with RECORD type and BLK mode).
>
> I also tried the way to use RESULT_DECL for this kind of type, or
> let aggregate_value_p accept this kind of type.  But it seems not easy,
> because "<retval> (RESULT_DECL with PARALLEL)" is not ok for address
> operations.
>
>
>>
>>> >> +          }
>>> >> +    }
>>> >> +
>>> >>     /* Set TREE_USED on all variables in the local_decls.  */
>>> >>     FOR_EACH_LOCAL_DECL (cfun, i, var)
>>> >>       TREE_USED (var) = 1;
>>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>>> >> index d9407432ea5..20973649963 100644
>>> >> --- a/gcc/expr.cc
>>> >> +++ b/gcc/expr.cc
>>> >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool 
>>> >> nontemporal)
>>> >>         return;
>>> >>       }
>>
>> I miss an explanatory comment here on that the following is heuristics
>> and its reasoning.
>>
>>> >>   +  if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)
>>> >> +       && TYPE_MODE (TREE_TYPE (from)) == BLKmode
>>
>> Why check TYPE_MODE here?  Do you want AGGREGATE_TYPE_P on the type
>> instead?
> Checking BLK, because I want make sure the param should occur on
> register and stack (saved from register).
> Actualy, my intention is checking:
> GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode
>
> For code:
> GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode
> && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
>     || REG_P (DECL_INCOMING_RTL (from)))
> This checking indicates if the param may be passing via 2 or more
> registers.
>
> Using "AGGREGATE_TYPE_P && (PARALLEL || REG_P)" may be ok and more
> readable. I would have a test.
Oh, AGGREGATE_TYPE_P seems not the intention of this patch, since if
struct size can be represented by an INT size, the mode would be that
INT mode, and no need to use block move for the assignment.

BR,
Jeff (Jiufu)
>
>>
>>> >> +       && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
>>> >> +           || REG_P (DECL_INCOMING_RTL (from))))
>>> >> +      || (VAR_P (to) && DECL_USEDBY_RETURN_P (to)
>>> >> +          && TYPE_MODE (TREE_TYPE (to)) == BLKmode
>>
>> Likewise.
>>
>>> >> +          && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl)))
>>> >> +               == PARALLEL))
>>
>> Not REG_P here?
> REG_P with BLK on return would means return in memory, instead return
> through registers, so, REG_P was not added here, and let it use
> orignal behavior.
>>
>>> >> +    {
>>> >> +      push_temp_slots ();
>>> >> +      rtx par_ret;
>>> >> +      machine_mode mode;
>>> >> +      par_ret = TREE_CODE (from) == PARM_DECL
>>> >> +                  ? DECL_INCOMING_RTL (from)
>>> >> +                  : DECL_RTL (DECL_RESULT (current_function_decl));
>>> >> +      mode = GET_CODE (par_ret) == PARALLEL
>>> >> +               ? GET_MODE (XEXP (XVECEXP (par_ret, 0, 0), 0))
>>> >> +               : word_mode;
>>> >> +      int mode_size = GET_MODE_SIZE (mode).to_constant ();
>>> >> +      int size = INTVAL (expr_size (from));
>>> >> +
>>> >> +      /* If/How the parameter using submode, it dependes on the size and
>>> >> +         position of the parameter.  Here using heurisitic number.  */
>>> >> +      int hurstc_num = 8;
>>> >
>>> > Where did this come from and what does it mean?
>>> Sorry for does not make this clear. We know that an aggregate arg may be
>>> on registers partially or totally, as assign_parm_adjust_entry_rtl.
>>> For an example, if a parameter with 12 words and the target/ABI only
>>> allow 8 gprs for arguments, then the parameter could use 8 regs at most
>>> and left part in stack.
>>
>> I also wonder about the exact semantics of the parallels we get here.
>>
>> +      int size = INTVAL (expr_size (from));
>>
>> esp. when you use sth as simple as this.  Shouldn't you instead look
>> at to_rtx since that's already expanded?  For returns that should
> Yes, I use "expr_size (from)" is just because the size should be same
> between "from", "to" and "to_rtx" for these cases.
>> be the desired layout to match 'from' to, no?  Maybe it's better
>> to not try sharing the code for both incoming and return copies
>> for clarity?
> OK, thanks, good sugguestion.
>>
cut...
>>> Since the subword is used to move block(read from source mem and then
>>> store to destination mem with register mode), and this would keep to use
>>> the same endianness on reg like move_block_from_reg. So, the patch does
>>> not check the endianness.
>>> 
>>> If any concerns and sugguestions, please point out, thanks!
>>> 
>>> BR,
>>> Jeff (Jiufu)
>>> >
>>> >
>>> > Jeff
>>> 

Reply via email to