> An example: which of these code sequences do you suppose runs faster? > > la rX,0(rI,rBase) rX -> array[i] > lg rY,0(,rX) rY = array[i] > agsi 0(rX),1 ++array[i] > * Now do something with rY > > vs: > lg rY,0(rI,rBase) rY = array[i] > la rX,1(,rY) rX = rY + 1 > stg rX,0(rI,rBase) effect is ++array[i] > * Now do something with rY > > The first is substantially faster. I would have GUESSED that the > second would be faster, since I need the value in rY anyway. (I'm in > 64-bit mode, so using "LOAD ADDRESS for the increment is safe...)
"Substantially faster" is probably a cache effect. Assuming a cache miss on array[i], in sequence 2, the LG will miss and install the cache line shared, and then the STG will need to do an upgrade to exclusive. The AGSI in sequence 1 will miss and install the cache line exclusive, avoiding the upgrade to exclusive. Adding a PFD 2,0(rI,rBase) before the LG in sequence 2 may make these sequences perform similarly. Also, in sequence 1, changing lg rY(0(,Rx) to lg rY,0(rI,rBase) may avoid some Address Generation Interlock effects (although various machines have various AGI bypasses for various instructions). And it may just transfer some of the AGI effect from the LG down to the AGSI. Jim Mulder z/OS System Test IBM Corp. Poughkeepsie, NY ---------------------------------------------------------------------- For IBM-MAIN subscribe / signoff / archive access instructions, send email to [email protected] with the message: INFO IBM-MAIN
