On 12 November 2016 at 10:10, Philippe Cloarec <philippe.cloa...@gmail.com> wrote:
> > Since we do talk of CPU cycles savings here I will check for AGI cases and > their resolution and try to implement instruction grouping as much I can. > > From my humble point this is a real topic and all z13 sites having old > productions Batch programs should perform some action. > I would reduce expectations of the benefits of "typical" things like loading registers early and avoiding AGI. Our CPU is not really that typical. Since a lot of our instructions have been coded in the Language of our Fathers, we can't simply recompile and take advantage of such tricks (even if customers had source code and wish to accept the business risk of recompiles). Instead, our CPU does very good in figuring out those things on its own with Out-of-Order Execution. Even better if you can take advantage of SMT. On z/OS you should be able to use hardware profiling to find the pieces that are worth a closer look (since z/VM does not virtualize that support, I had to write my own profiler in software). I have had numerous cases where I expected low-hanging fruit and found that I could not do outrun our CPU. For some critical parts it does help to unroll a loop a little bit, but something extreme with 16-fold and swapping registers actually made it slower. Hopefully you find a spot that is worth spending some time to optimize. If you're looking at touching all code to scrape off 10% it may be wiser to look higher up in the stack. I will be the first to admit that you can achieve impressive results by carefully coding a critical part using the right instructions. In one case we had a end-user transaction take 700 ms - when I was done it was down to 7 ms. This was somewhat unique in that it did module multiplication for cryptography. The code had been written with the assumption that operations on words twice as long take 4 times more time. But on our CPU it takes just log(2) times longer, so going from 16-bit multiply to 64-bit saves you a lot. Rob