On Thu, Jun 25, 2015 at 4:43 AM, Richard Biener
<richard.guent...@gmail.com> wrote:
> when the new scheme triggers vectorization cannot succeed on the
> result as we get
>
>   if (cond)
>     *p = val;
>
> if-converted to
>
>   tem = cond ? p : &scratch;
>   *tem = val;

That's correct.

>
> and
>
>    if (cond)
>      val = *p;
>
> if-converted to
>
>   scatch = val;
>   tem = cond ? p : &scratch;
>   val = *tem;

The patch does this slightly differently:

   tem = cond ? p : &scratch;
   val = cond ? *tem : val;

I think I like your version better as it has only one cond_expr.

>
> and thus the store and loads appear as scather/gather ones to
> the vectorizer (if-conversion could directly generate masked
> load/stores of course and not use a scratch-pad at all in that case).
>
> So the question is whether we get more non-vectorized if-converted
> code out of this

this is the case.

> (and thus whether we want to use
> --param allow-store-data-races to get the old code back which is
> nicer to less capable CPUs and probably faster than using
> scatter/gather or masked loads/stores).

A flag to allow load and store data-races is an interesting suggestion.

Abe also suggested to continue optimizing the other way in cases
where we know to write or load from the same location on all branches:

if (c)
  A[i] = ...
else
  A[i] = ...

> scatter support is still not implemented in the vectorizer.

Correct.

> I also wonder if we should really care about load data races
> (not sure your patch does).

The patch does both loads and stores.

> I didn't look at the patch in detail yet - please address Alans comments
> and re-post an updated patch.
>
> In general I like the idea.

Thanks for your review,
Sebastian

>
> Thanks,
> Richard.
>
>>>
>>>
>>> Re. creation of scratchpads:
>>>     (1) Should the '64' byte size be the result of scanning the
>>> function, for the largest data size to which we store? (ideally,
>>> conditionally store!)
>>
>> I suspect most functions have conditional stores, but far fewer have
>> conditional stores that we'd like to if-convert.  ISTM that if we can lazily
>> allocate the scratchpad that'd be best.   If this were an RTL pass, then I'd
>> say query the backend for the widest mode store insn and use that to size
>> the scratchpad.  We may have something similar we can do in gimple without
>> resorting querying insn backend capabilities. Perhaps walking the in-scope
>> addressable variables or somesuch.
>>
>>
>>>     (2) Allocating only once per function: if we had one scratchpad per
>>> loop, it could/would live inside the test of "gimple_build_call_internal
>>> (IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more
>>> loops in the function, but then fail to vectorize them, we'll leave the
>>> scratchpad around for later phases to clean up. Is that OK?
>>
>> If the scratchpad is local to a function, then I'd expect we'd clean it up
>> just like any other unused local.  Once it's a global, then all bets are
>> off.
>>
>> Anyway, I probably should just look at the patch before making more
>> comments.
>>
>> jeff
>>

Reply via email to