On Friday, 10 May 2019 at 23:58:37 UTC, Mike Franklin wrote:

I don't know how a proper assembly implementation would not be performant. Perhaps you could elaborate.

Inline assembly prevents a lot of optimizations that give large performance gains such as constant propagation. Say you implement a memcpy with a different signature than C's mempcy (because of slices instead of pointers), then the optimizer does not know what the semantics of that function are and will need the function to be transparent (not assembly) to do such optimizations.

But I'm sure you know all that, so that's not your question. :)

In the case of reimplementing memcpy/mem* in a function with the same signature as libc, that is not supposed to be inlined (like the current libc functions), then I also think the use of inline asm will not give a perf penalty. Be careful to recreate the exact same semantics as those libc functions because the optimizer is going to _assume_ it knows _exactly_ what those functions are doing.

cheers,
  Johan

Reply via email to