on 2020/6/3 下午3:07, Richard Biener wrote:
> On Wed, 3 Jun 2020, Kewen.Lin wrote:
> 
>> Hi Richi,
>>
>> on 2020/6/2 下午7:38, Richard Biener wrote:
>>> On Thu, 28 May 2020, Kewen.Lin wrote:
>>>
>>>> Hi,
>>>>
>>>> This is one repost and you can refer to the original series 
>>>> via https://gcc.gnu.org/pipermail/gcc-patches/2020-January/538360.html.
>>>>
>>>> As we discussed in the thread
>>>> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00196.html
>>>> Original: https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00104.html,
>>>> I'm working to teach IVOPTs to consider D-form group access during 
>>>> unrolling.
>>>> The difference on D-form and other forms during unrolling is we can put the
>>>> stride into displacement field to avoid additional step increment. eg:
>>>>
>>>> With X-form (uf step increment):
>>>>   ...
>>>>   LD A = baseA, X
>>>>   LD B = baseB, X
>>>>   ST C = baseC, X
>>>>   X = X + stride
>>>>   LD A = baseA, X
>>>>   LD B = baseB, X
>>>>   ST C = baseC, X
>>>>   X = X + stride
>>>>   LD A = baseA, X
>>>>   LD B = baseB, X
>>>>   ST C = baseC, X
>>>>   X = X + stride
>>>>   ...
>>>>
>>>> With D-form (one step increment for each base):
>>>>   ...
>>>>   LD A = baseA, OFF
>>>>   LD B = baseB, OFF
>>>>   ST C = baseC, OFF
>>>>   LD A = baseA, OFF+stride
>>>>   LD B = baseB, OFF+stride
>>>>   ST C = baseC, OFF+stride
>>>>   LD A = baseA, OFF+2*stride
>>>>   LD B = baseB, OFF+2*stride
>>>>   ST C = baseC, OFF+2*stride
>>>>   ...
>>>>   baseA += stride * uf
>>>>   baseB += stride * uf
>>>>   baseC += stride * uf
>>>>
>>>> Imagining that if the loop get unrolled by 8 times, then 3 step updates 
>>>> with
>>>> D-form vs. 8 step updates with X-form. Here we only need to check stride
>>>> meet D-form field requirement, since if OFF doesn't meet, we can construct
>>>> baseA' with baseA + OFF.
>>>
>>> I'd just mention there are other targets that have the choice between
>>> the above forms.  Since IVOPTs itself does not perform the unrolling
>>> the IL it produces is the same, correct?
>>>
>> Yes.  Before this patch, IVOPTs doesn't consider the unrolling impacts,
>> it only models things based on what it sees.  We can assume it thinks
>> later RTL unrolling won't perform.
>>
>> With this patch, since the IV choice probably changes, the IL can probably
>> change.  The typical difference with this patch is:
>>
>>   vect__1.7_15 = MEM[symbol: x, index: ivtmp.19_22, offset: 0B];
>> vs.
>>   vect__1.7_15 = MEM[base: _29, offset: 0B];
> 
> So we're asking IVOPTS "if we were unrolling this loop would you make
> a different IV choice?" thus I wonder why we need so much complexity
> here?  

I would describe it more like "we are going to unroll this loop with
unroll factor uf in RTL, would you consider this variable when modeling?"

In most cases, one single iteration is representative for the unrolled
body, so it doesn't matter considering unrolling or not.  But for the
case here, it's not true, expected reg_offset iv cand can make iv cand
step cost reduced, it leads the difference.

> That is, if we can classify the loop as being possibly unrolled
> we could evaluate IVOPTs IV choice (and overall cost) on the original
> loop and in a second run on the original loop with fake IV uses
> added with extra offset.  If the overall IV cost is similar we'll
> take the unroll friendly choice if the costs are way different
> (I wouldn't expect this to be the case ever?) I'd side with the
> IV choice when not unrolling (and mark the loop as to be not unrolled).
> 

Could you elaborate it a bit?  I guess it won't estimate the unroll
factor here, just guess it's to be unrolled or not?  The second run
with fake IV uses added with extra offset sounds like scaling up the 
iv group cost by uf.

> Thus I'd err on the side of not unrolling but leave the ultimate choice
> of whether to unroll to RTL unless IV cost makes that prohibitive.
> 
> Even without X- or D- form addressing modes the IV choice may differ
> and I think we don't need extra knobs for the unroller but instead
> can decide to set the existing n_unroll to zero (force not unroll)
> when costs say it would be bad?

Yes, even without x- or d- form addressing, the difference probably comes 
from compare type IV use for loop ending, maybe more cases which I am not
aware of.  But I don't see people care about it, probably the impact is
small.

IIUC what you stated here looks like to use ivopts information for unrolling
factor decision, I think this is a separate direction, do we have this
kind of case where ivopts costs can foresee the unrolling?

Now the unroll factor estimation can be used for other optimization passes
if they are wondering future unrolling factor decision, as discussed it
sounds a good idea to override the n_unroll with some benchmarking.

BR,
Kewen

> 
> Richard.
> 
>> BR,
>> Kewen
>>
>>> Richard.
>>>
>>
> 

Reply via email to