On Tue, 3 Jun 2014 15:27:48 +0000, DASDBILL2 wrote:

>What if you measured the total cpu time consumed by code such as
>the following to execute a truly huge number of only XR instructions
>and then divide by the number of XR instructions executed?  I would
>think that this would be the smallest possible time for one XR; i.e.,
>the maximum possible pipelining with zero stalls.

I would expect that the code that you shown below would stall the
pipeline for every XR instruction. Why? because they all use the
same register. I wouldn't consider 128 million instructions to be
"truly huge"

>Â Â Â Â Â Â Â Â Â Â Â Â Â  LAYÂ Â  R0,1000000
>Â Â Â Â Â Â Â Â Â Â Â Â Â  LAÂ Â Â Â Â  R1,LOOP1
>*Â force alignment here to a 256-byte boundary; i.e., the length of a cache
line
>LOOP1Â Â XRÂ Â Â  R2,R2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  first of 127 such XR 
>instructions
>Â Â Â Â Â Â Â Â Â Â Â Â Â  XRÂ Â Â  R2,R2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 
>second of 127 such XR instructions
>Â Â Â Â Â Â Â Â Â Â Â Â Â  ...
>Â Â Â Â Â Â Â Â Â Â Â Â Â  XRÂ Â Â  R2,R2Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  127th 
>and last of 127 such XR
instructions
>Â Â Â Â Â Â Â Â Â Â Â Â Â  BCTRÂ  R0,R1Â Â Â Â Â Â Â Â Â Â Â Â Â execute the 
>previous 127 XR
instructions one million times
>* at this point, we have filled one cache line with 127 consecutive XR
>instructions followed by the BCTR, and all 128 of these instructions
>fit exactly within one cache line.
>Â Â Â Â Â Â Â Â  ... end of loop.
>When finished performing the loop, we will have executed 127,000,000
>XR instructions and 1,000,000 BCTR instructions.  Ignore the time used
>by the BCTR instructions.  Divide total CPU time delta by 127,000,000
>to compute the approximate minimum time possible to do one XR
>instruction.
>Â
>Then do the same thing for an SR, an SLR, and a LR that is loading a
>register from another register that has been previously zeroed.  This
>technique could also be done with 63 consecutive LA Rx,0 instructions.

If you are going to perform this test, I suggest that you run it ten
times for each instruction. I'll bet that you get as much variation
among the XR tests as you do between XR and SR.

--
Tom Marchant

Reply via email to