On 6/1/2011 9:19 AM, Charles Mills wrote:
It seems to me like the ideal way to do this would be to have not two stages
(MVC for 256 and EX'ed MVC) but rather three cases: A loop with a
"hard-coded" or "unrolled" string of 16 MVC's that moved 4K blocks and
incremented registers by 4K on each iteration; followed by a loop of
256-byte MVCs; followed by an EX'ed MVC for 1 to 255 bytes. (Obviously each
step would be optional depending on the exact count.)

If you go through this exercise, I'd also suggest one (minor?) variation - variable MVCs to bump the starting address up to a 4K multiple (if needed), then the 4K byte moves, then some more short ones, as needed. It would be instructive to see whether that's faster than a set of unaligned moves.


Gerhard Postpischil
Bradford, VT

----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to lists...@bama.ua.edu with the message: GET IBM-MAIN INFO
Search the archives at http://bama.ua.edu/archives/ibm-main.html

Reply via email to