On 6/1/2011 9:19 AM, Charles Mills wrote:
It seems to me like the ideal way to do this would be to have not two stages (MVC for 256 and EX'ed MVC) but rather three cases: A loop with a "hard-coded" or "unrolled" string of 16 MVC's that moved 4K blocks and incremented registers by 4K on each iteration; followed by a loop of 256-byte MVCs; followed by an EX'ed MVC for 1 to 255 bytes. (Obviously each step would be optional depending on the exact count.)
If you go through this exercise, I'd also suggest one (minor?) variation - variable MVCs to bump the starting address up to a 4K multiple (if needed), then the 4K byte moves, then some more short ones, as needed. It would be instructive to see whether that's faster than a set of unaligned moves.
Gerhard Postpischil Bradford, VT ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO Search the archives at http://bama.ua.edu/archives/ibm-main.html