https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77610
--- Comment #5 from Rich Felker <bugdal at aerifal dot cx> --- Of course, fancy memcpy in general is only a win beyond a certain size. For DMA I did not mean I want to use DMA for any size beyond gcc's proposed function-call threshold. Rather, the vdso-provided function would choose what to do appropriately for the hardware. But on J2 (nommu, no special kernel mode) I suspect DMA could be a win at sizes as low as 256 bytes, with spin-to-completion and a lock shared between user (vdso) and kernel rather than using a syscall (not sure this is justified, though). Using a syscall with sleep-during-dma would have a significantly larger threshold before it's worthwhile. Regarding how I measured kernel performance increase, I was just looking at boot timing with printk timestamps enabled. The main time consumer is unpacking initramfs.