On Thu, Jun 25, 2015 at 4:43 AM, Richard Biener <richard.guent...@gmail.com> wrote: > when the new scheme triggers vectorization cannot succeed on the > result as we get > > if (cond) > *p = val; > > if-converted to > > tem = cond ? p : &scratch; > *tem = val;
That's correct. > > and > > if (cond) > val = *p; > > if-converted to > > scatch = val; > tem = cond ? p : &scratch; > val = *tem; The patch does this slightly differently: tem = cond ? p : &scratch; val = cond ? *tem : val; I think I like your version better as it has only one cond_expr. > > and thus the store and loads appear as scather/gather ones to > the vectorizer (if-conversion could directly generate masked > load/stores of course and not use a scratch-pad at all in that case). > > So the question is whether we get more non-vectorized if-converted > code out of this this is the case. > (and thus whether we want to use > --param allow-store-data-races to get the old code back which is > nicer to less capable CPUs and probably faster than using > scatter/gather or masked loads/stores). A flag to allow load and store data-races is an interesting suggestion. Abe also suggested to continue optimizing the other way in cases where we know to write or load from the same location on all branches: if (c) A[i] = ... else A[i] = ... > scatter support is still not implemented in the vectorizer. Correct. > I also wonder if we should really care about load data races > (not sure your patch does). The patch does both loads and stores. > I didn't look at the patch in detail yet - please address Alans comments > and re-post an updated patch. > > In general I like the idea. Thanks for your review, Sebastian > > Thanks, > Richard. > >>> >>> >>> Re. creation of scratchpads: >>> (1) Should the '64' byte size be the result of scanning the >>> function, for the largest data size to which we store? (ideally, >>> conditionally store!) >> >> I suspect most functions have conditional stores, but far fewer have >> conditional stores that we'd like to if-convert. ISTM that if we can lazily >> allocate the scratchpad that'd be best. If this were an RTL pass, then I'd >> say query the backend for the widest mode store insn and use that to size >> the scratchpad. We may have something similar we can do in gimple without >> resorting querying insn backend capabilities. Perhaps walking the in-scope >> addressable variables or somesuch. >> >> >>> (2) Allocating only once per function: if we had one scratchpad per >>> loop, it could/would live inside the test of "gimple_build_call_internal >>> (IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more >>> loops in the function, but then fail to vectorize them, we'll leave the >>> scratchpad around for later phases to clean up. Is that OK? >> >> If the scratchpad is local to a function, then I'd expect we'd clean it up >> just like any other unused local. Once it's a global, then all bets are >> off. >> >> Anyway, I probably should just look at the patch before making more >> comments. >> >> jeff >>