> This replaces the various implementations of memset and memcpy, > including the ARM RTABI ones (__aeabi_mem[set|clr]_[|4|8]) with > a single C implementation for each. The ones we have are either not > very sophisticated (ARM), or they are too sophisticated (memcpy() on > AARCH64, which may perform unaligned accesses) or already coded in > C > (memset on AArch64).
Ard, I'm concerned about the performance impact of this change... there's a reason for all that complexity and it's to optimize performance. Why does memcpy performance matter? In addition to the overall memcpy stuff scattered around C code we have an instance that is particularly sensitive to memcpy performance. For DMA operations when invoking double-buffering or access to portions of a buffer that is common mapped (i.e. uncached on non-coherent DMA systems) the impact of a non-optimized memcpy is enormous compared to the optimized ones because the penalty is amplified by orders of magnitude due to uncached memory access latency. So I would ask that before a change like this is brought in that we characterize the cached-cached and cached-uncached (and perhaps unaligned cached-cached) performance across the implementations. Based on my experience I'm expecting both cases will take a massive performance hit. >From your commit message I'm inferring that the problem you're solving is to >play nice in environments that can't tolerate unaligned access like when the >MMU is off. I get that - and I think a variant of the library that plays nice >in these limited cases makes sense. However, I don't think we should drag >down the performance down of the rest of the environment where we spend the >vast majority of our time executing. Eugene _______________________________________________ edk2-devel mailing list [email protected] https://lists.01.org/mailman/listinfo/edk2-devel

