Re: Microprocessor Optimization Primer

Jim Mulder Tue, 05 Apr 2016 11:39:12 -0700

> > Can I infer from this that XR/XGR, all else being equal, is to be 
> preferred (slightly) over LHI/LGHI?
> > If so, why might that be?  I would have thought the one that 
> doesn't touch the CC would be "more efficient" than the one that does.
> > Or am I misreading your statement?
> > 
> Instruction fetch bandwidth?  Perhaps "don't care about the CC"
> implies that no instruction testing CC might be in the pipeline
> so lockout is not a concern.


Response from engineering:

It all depends how complicated you want to scheduling algorithm to be.

If cc being set to 0 is not a problem for scheduling surrounding code, 
then XR(XGR) will likely be better because of its shorter instruction 
lengths.

On cc-usage, if it may be more optimal to keep a sequence of 
CC setting instruction ..  clear register .. cc using instruction;
then L(G)HI can be used.
 
In general, 4-byte instruction is handled quite well and just like 
2-byte, except may be on line-crossing. In the case where 
super-optimal grouping is key, like within a hot loop, then 
the 2-byte instruction potentially can allow better grouping 
opportunity depending on the instruction lengths of other instructions
and addresses. A 2-byte instruction provides an ease-of-mind scheduling 
because it will always allow maximum grouping.

On the other hand, before we had the "fastpath", I believe the source 
register was treated as a dependency, so in a hypothetical sequence of:
L   R1, (mem) 
AR  R2, R1 
XR R1,R1 
the XR would have waited for the L.
However, with the latest fast-path, that is not a concern.

So, the complicated algorithm can be:
if no cc scheduling conflict, and (pre-zEC12) no immediate earlier 
setter of GR, then X(G)R should be used; 
otherwise, it might be better to use L(G)HI.

Jim Mulder   z/OS System Test   IBM Corp.  Poughkeepsie,  NY

Re: Microprocessor Optimization Primer

Reply via email to