> This replaces the various implementations of memset and memcpy,
> including the ARM RTABI ones (__aeabi_mem[set|clr]_[|4|8]) with
> a single C implementation for each. The ones we have are either not
> very sophisticated (ARM), or they are too sophisticated (memcpy() on
> AARCH64, which may perform unaligned accesses) or already coded in
> C
> (memset on AArch64).

Ard,

I'm concerned about the performance impact of this change... there's a reason 
for all that complexity and it's to optimize performance.

Why does memcpy performance matter?  In addition to the overall memcpy stuff 
scattered around C code we have an instance that is particularly sensitive to 
memcpy performance.  For DMA operations when invoking double-buffering or 
access to portions of a buffer that is common mapped (i.e. uncached on 
non-coherent DMA systems) the impact of a non-optimized memcpy is enormous 
compared to the optimized ones because the penalty is amplified by orders of 
magnitude due to uncached memory access latency.

So I would ask that before a change like this is brought in that we 
characterize the cached-cached and cached-uncached (and perhaps unaligned 
cached-cached) performance across the implementations.  Based on my experience 
I'm expecting both cases will take a massive performance hit.

>From your commit message I'm inferring that the problem you're solving is to 
>play nice in environments that can't tolerate unaligned access like when the 
>MMU is off.  I get that - and I think a variant of the library that plays nice 
>in these limited cases makes sense.  However, I don't think we should drag 
>down the performance down of the rest of the environment where we spend the 
>vast majority of our time executing.

Eugene



_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.01.org/mailman/listinfo/edk2-devel

Reply via email to