On Mon, Oct 31, 2022 at 04:13:38PM -0600, Jeff Law wrote:
> On 10/30/22 20:42, Jiufu Guo via Gcc-patches wrote:
> >We know that for struct variable assignment, memory copy may be used.
> >And for memcpy, we may load and store more bytes as possible at one time.
> >While it may be not best here:

> So the first question in my mind is can we do better at the gimple 
> phase?  For the second case in particular can't we just "return a" 
> rather than copying a into <retval> then returning <retval>?  This feels 
> a lot like the return value optimization from C++.  I'm not sure if it 
> applies to the first case or not, it's been a long time since I looked 
> at NRV optimizations, but it might be worth poking around in there a bit 
> (tree-nrv.cc).

If it is a bigger struct you end up with quite a lot of stuff in
registers.  GCC will eventually put that all in memory so it will work
out fine in the end, but you are likely to get inefficient code.

OTOH, 8 bytes isn't as big as we would want these days, is it?  So it
would be useful to put smaller temportaries, say 32 bytes and smaller,
in registers instead of in memory.

> But even so, these kinds of things are still bound to happen, so it's 
> probably worth thinking about if we can do better in RTL as well.

Always.  It is a mistake to think that having better high-level
optimisations means that you don't need good low-level optimisations
anymore: in fact deficiencies there become more glaringly apparent if
the early pipeline opts become better :-)

> The first thing that comes to my mind is to annotate memcpy calls that 
> are structure assignments.  The idea here is that we may want to expand 
> a memcpy differently in those cases.   Changing how we expand an opaque 
> memcpy call is unlikely to be beneficial in most cases.  But changing 
> how we expand a structure copy may be beneficial by exposing the 
> underlying field values.   This would roughly correspond to your method 
> #1.
> 
> Or instead of changing how we expand, teach the optimizers about these 
> annotated memcpy calls -- they're just a a copy of each field.   That's 
> how CSE and the propagators could treat them. After some point we'd 
> lower them in the usual ways, but at least early in the RTL pipeline we 
> could keep them as annotated memcpy calls.  This roughly corresponds to 
> your second suggestion.

Ideally this won't ever make it as far as RTL, if the structures do not
need to go via memory.  All high-level optimissations should have been
done earlier, and hopefully it was not expand tiself that forced stuff
into memory!  :-/


Segher

Reply via email to