Hello Jarrett,
I'm actually kind of shocked that given the prevalence of memory block copy operations that more CPUs haven't implemented it as a basic instruction. Yes, RISC is nice, but geez, this seems like a no-brainer.
How about memory to memory DMA, Why even make the CPU wait for it to finish?