Jakub Jelinek wrote: >On Thu, Apr 12, 2018 at 05:29:35PM +0000, Wilco Dijkstra wrote: >> > Depending on what you mean old, I see e.g. in 2010 power7 mempcpy got >> > added, >> > in 2013 other power versions, in 2016 s390*, etc. Doing a decent mempcpy >> > isn't hard if you have asm version of memcpy and one spare register. >> >> More mempcpy implementations have been added in recent years indeed, but >> almost all >> add an extra copy of the memcpy code rather than using a single combined >> implementation. >> That means it is still better to call memcpy (which is frequently used and >> thus likely in L1/L2) >> rather than mempcpy (which is more likely to be cold and thus not cached). > > That really depends, usually when some app uses mempcpy, it uses it very > heavily.
But it would have to not use memcpy nearby. Do you have examples of apps using mempcpy significantly? GLIBC is the only case I've seen that uses mempcpy. > And all the proposed patches do is honor what the user asked, if > you use memcpy () + n, we aren't transforming that into mempcpy behind the > user's back. We transform a lot of calls behind the user's back so that's not a plausible argument for "honoring" the original call. Eg. (void)mempcpy always gets changed to memcpy, bcopy to memmove, bzero to memset, strchr (s, 0) into strlen(s) + s - the list is long and there are plenty cases where these expansions block tailcalls. > Anyway, here is what I think Richard was asking for, that I'm currently > bootstrapping/regtesting. It can be easily combined with Martin's target > hook if needed, or do it only for > endp == 1 && target != const0_rtx && CALL_EXPR_TAILCALL (exp) This patch causes regressions on AArch64 since it now always calls mempcpy again, so yes either it would need to be done only for tailcalls (which fixes the reported regression) or we need Martin's change too so each target can state whether they have a fast mempcpy or not. Wilco