Hi, I don't believe there is a missing optimization here: compilers expand mempcpy by default into memcpy since that is the standard library call. That means even if your source code contains mempcpy, there will never be any calls to mempcpy.
The reason is obvious: most targets support optimized memcpy in the C library while very few optimize mempcpy. The same is true for bzero, bcmp and bcopy. Targets can do it differently, IIRC x86 is the only target that emits calls both to memcpy and mempcpy. Cheers, Wilco