-----Original Message-----
From: Zhigang Gong [mailto:zhigang.g...@linux.intel.com] 
Sent: Thursday, September 24, 2015 1:12 PM
To: Guo, Yejun
Cc: beignet@lists.freedesktop.org
Subject: Re: [Beignet] [PATCH V2 3/3] add local copy propagation optimization 
for each basic block

On Thu, Sep 24, 2015 at 06:05:31AM +0000, Guo, Yejun wrote:
> 
> 
> -----Original Message-----
> From: Zhigang Gong [mailto:zhigang.g...@linux.intel.com]
> Sent: Thursday, September 24, 2015 12:32 PM
> To: Guo, Yejun
> Cc: beignet@lists.freedesktop.org
> Subject: Re: [Beignet] [PATCH V2 3/3] add local copy propagation 
> optimization for each basic block
> 
> On Thu, Sep 24, 2015 at 02:58:22AM +0000, Guo, Yejun wrote:
> > 
> > > +
> > > +  void SelBasicBlockOptimizer::changeInsideReplaceInfos(const
> > > + SelectionInstruction& insn, GenRegister& var)  {
> > > +    for (ReplaceInfo* info : replaceInfos) {
> > > +      if (info->intermedia.reg() == var.reg()) {
> A new comment here is that the above loop is a little bit too heavy.
> For a large BB which has many MOV instructions, it will iterate too many 
> times for instructions after those MOV instructions. A better way is to 
> change to use a new map type of replaceInfos:
> map <ir::Register, set<ReplaceInfo*>>
> 
> This will be much faster than iterate all infos for each instruction.
> 
> [Yejun] nice, I'll change to map.
> 
> > > +        bool replaceable = false;
> > It's better to add a comment here to describe the cases which can't be 
> > replaced and why.
> > 
> > [yejun] actually, I think the code itself explains something, it is much 
> > complex to explain every detail in human words. I consider it as a 
> > nice-to-have since the basic logic is simple.
> I still think it is not that simple. A case just came into my mind which we 
> can't do replacement is:
> 
> MOV %r0, %r1
> ADD %r1, %r1, 1
> ...
> ADD %r10, %r0, 1
> ...
> 
> I'm afraid that your current code can't deal it correctly, right?
> 
> [yejun] current code does nothing for these instructions. It also relatives 
> to constant, maybe add another optimization path to keep every path clear and 
> simple. Or we can consider current code as a basic, and to extend it when 
> find such optimization is necessary.
If my understanding is correct, current code will replace %r0 to %r1 in the 
second ADD instruction which breaks the code. This is not an optimize 
opportunity, but a bug and must be fixed. The root cause is %r1 has been 
modified after copy its value to another register, then we should not propagate 
it to the destination register after that modification. For example, for all 
instruction after ADD %r1, %r1, 1, We could not do the 's/r0/r1'.

[Yejun] the current code does handle such case.
Firstly, for the terms.  
R0 = r1
R3 = r0 + r2
In the first MOV IR, R0 is named as intermedia, and r1 is named as replacement.
In the second IR, r0 is collected into toBeReplacements.

For the following IR:
1) MOV %r0, %r1
2) MOV %r2, %r0
3) ADD %r1, %r1, 1
 ...
4) ADD %r10, %r0, 1

When the 3) IR is scanned, in SelBasicBlockOptimizer::removeFromReplaceInfos, 
replacementChanged is recorded.

When the 4) IR is scanned, in SelBasicBlockOptimizer::changeInsideReplaceInfos, 
we'll remove the info with the following code because replacementChanged makes 
replaceable false.
        if (!replaceable) {
          replaceInfos.erase(info);
          delete info;
        }

If there is no 4) IR with %r0 as source, at the end of scan, in 
SelBasicBlockOptimizer::cleanReplaceInfos, 1) and 2) IR will be optimized.

> 
> > 
> > > +            uint32_t elements = CalculateElements(var, 
> > > insn.state.execWidth);
> > > +            if (info->elements == elements) {
> > Why not compare each attribute value, such as 
> > vstride_size,hstride_size,.....?
> > 
> > [Yejun] the execWidth is not recorded in the GenRegister, and I once saw 
> > instructions such as:
> > mov(1)  dst (vstride_size or hstride_size is 1/2/... or something 
> > else), src
> > mov(16) anotherdst, dst (with stride as 0)
> > 
> > the dst here is the same, but the strides are different, to handle this 
> > case, I add the function CalculateElements.
> The example you gave here has two different GenRegisters as GenRegister has 
> vstride and hstride members. Right? My previous comment suggests to compare 
> two GenRegistes' attributs and I can't understand your point here.
> 
> [Yejun] Let's use the following SEL IR as an example, %42 is the same 
> register but with two different stride in the two IRs. 
> MOV(1)              %42<2>:D  :       %41<0,1,0>:D
> MOV(16)           %43<1>:D    :       %42<0,1,0>:D
This doesn't address my comment. As the comment suggests to compare the 
GenRegisters' attribute not the virtual register. You can easily found the 
above two GenRegisters are different.

[Yejun] My expectation is that %42 should be considered as the same, and to 
optimize the IR as "mov(16) %43<1>:D, %41<0,1,0>:D". The optimization chance is 
lost if we compare the width/stride of %42 and fount they are not the same.

> 
> 
> Thanks,
> Zhigang Gong.
> 
> > 
> > 
> > 
> > 
> > The other parts LGTM,
> > 
> > Thanks,
> > Zhigang Gong
_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/beignet

Reply via email to