Re: Is there a source for detailed, instruction-level performance info?

Jim Mulder Mon, 28 Dec 2015 11:09:45 -0800

> An example: which of these code sequences do you suppose runs faster?
> 
>          la rX,0(rI,rBase)       rX -> array[i]
>          lg rY,0(,rX)            rY = array[i]
>          agsi 0(rX),1            ++array[i]
> * Now do something with rY
> 
> vs:
>          lg rY,0(rI,rBase)       rY = array[i]
>          la rX,1(,rY)            rX = rY + 1
>          stg rX,0(rI,rBase)      effect is ++array[i]
> * Now do something with rY
> 
> The first is substantially faster. I would have GUESSED that the 
> second would be faster, since I need the value in rY anyway. (I'm in
> 64-bit mode, so using "LOAD ADDRESS for the increment is safe...)


  "Substantially faster" is probably a cache effect.  Assuming a cache 
miss on array[i], in sequence 2, the LG will miss and install
the cache line shared, and then the STG will need to do an upgrade to
exclusive.  The AGSI in sequence 1 will miss and install the cache line
exclusive, avoiding the upgrade to exclusive.   Adding a PFD 2,0(rI,rBase)
before the LG in sequence 2 may make these sequences perform similarly.

 Also, in sequence 1, changing lg rY(0(,Rx) to 
lg rY,0(rI,rBase)  may avoid some Address Generation Interlock 
effects (although various machines have various AGI bypasses for various
instructions).  And it may just transfer some of the AGI effect from the 
LG down to the AGSI. 
 
Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY



----------------------------------------------------------------------
For IBM-MAIN subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO IBM-MAIN

Re: Is there a source for detailed, instruction-level performance info?

Reply via email to