On Tue, 30 Jul 2019, Andre Vieira (lists) wrote: > > > On 30/07/2019 13:16, Andre Vieira (lists) wrote: > > Hi Richard, > > > > I believe this is in line with what you were explaining to me earlier. The > > one thing I haven't quite done here is the jump around for if there is no > > prolog peeling. Though I think skip_vectors introduces the jumping we need. > > > > The main gist of it is: > > 1) if '--param vect-epilogues-nomask=1' is passed we refrain from adding the > > versioning threshold check when versioning a loop at first, > > 2) later we allow the vectorizer to use skip_vectors to basically check > > niters to be able to skip the vectorized loop on to the right epilogue, > > 3) after vectorizing the epilogue, we check whether this was a versioned > > loop and whether we skipped adding the versioning threshold, if so we add a > > threshold based on the last VF > > > > There is a caveat here, if we don't vectorize the epilogue, because say our > > architecture doesn't know how, we will end up with a regression. > > Let's say we have a loop for which our target can vectorize for 32x but not > > 16x, we get: > > > > if (alias & threshold_for_16x ()) > > { > > if (niters > 31) > > vectorized_31 (); > > else > > scalar (); > > } > > else > > scalar (); > > > > Imagine our iteration count is 18, all we hit is the scalar loop, but now we > > go through 1x alias and 2x niters. Whereas potentially we could have done > > with just 1x niters.
True. Note we should swap the checks to if (threshold_for_16x && alias) > > A mitigation for this could be to only add the threshold check if we > > actually vectorized the last loop, I believe a: > > 'if (LOOP_VINFO_VECTORIZABLE_P (loop_vinfo)) return NULL;' inside that new > > "else" in vect_transform_loop would do the trick. Though then we will still > > have that extra alias check... > > > I am going to hack around in the back-end to "fake" a situation like this > > and see what happens. > > Right should have quickly tried this before sending the email, what happens is > actually vect_transform_loop never gets called because try_vectorize_loop_1 > will recognize it can't vectorize the epilogue. So we end up with the > "mitigation" result I suggested, where we simply don't have a versioning > threshold which might still not be ideal. I think the main issue is how we have implemented epilogue vectorization. Ideally when vectorizing the loop () we'd recognize all VFs we can handle and thus know beforehand. I think that's already done for the sake of openmp simd now so doing this when epilogue vectorization is enabled as well wouldn't be too bad - so then we know, at the time we do the versioning, what and how many vectorized epilogues we create. See vect_analyze_loop and the loop over vector sizes. > > > > Another potential issue arises when the profitability threshold obtained > > from the cost model isn't guaranteed to be either LE or EQ to the versioning > > threshold. In that case there are two separate checks, which now we no > > longer will attempt to fold. > > And I just realized I don't take the cost model threshold into account when > creating the new threshold check, I guess if it is ordered_p we should again > take the max there between the new threshold check and the cost threshold for > the new check. Also see https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01397.html for a patch that computes costs of each possible VF, with that we could compute a better combined estimate for minimum number of iters (also considering the extra jumps to reach the main VF/2 loop). > > > > I am trying to find a way to test and benchmark these changes. Unfortunately > > I am having trouble doing this for AVX512 as I find that the old '--param > > vect_epilogues_nomask=1' already causes wrong codegen in SPEC2017 for the > > gcc and perlbench benchmarks. There's a reason it is not enabled by default. But I'd like to fix bugs it has so can you possibly reduce testcases and file bugs for it? Thanks, Richard. > > > > > > Cheers, > > Andre > > > > On 19/07/2019 13:38, Andre Vieira (lists) wrote: > > > > > > > > > On 19/07/2019 12:35, Richard Biener wrote: > > > > On Fri, 19 Jul 2019, Andre Vieira (lists) wrote: > > > > > > > > > > > > > > > > > > > On 15/07/2019 11:54, Richard Biener wrote: > > > > > > On Mon, 15 Jul 2019, Andre Vieira (lists) wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > On 12/07/2019 11:19, Richard Biener wrote: > > > > > > > > On Thu, 11 Jul 2019, Andre Vieira (lists) wrote: > > > > > > > > > > > > > > > > > > > > > I have code that can split the condition and alias checks in > > > > > > > 'vect_loop_versioning'. For this approach I am considering keeping > > > > > > > that > > > > > > > bit of > > > > > > > code and seeing if I can patch up the checks after vectorizing the > > > > > > > epilogue > > > > > > > further. I think initially I will just go with a "hacked up" way > > > > > > > of > > > > > > > passing > > > > > > > down the bb with the iteration check and split the false edge > > > > > > > every time > > > > > > > we > > > > > > > vectorize it further. Will keep you posted on progress. If you > > > > > > > have any > > > > > > > pointers/tips they are most welc ome! > > > > > > > > > > > > I thought to somehow force the idea that we have a prologue loop > > > > > > to the vectorizer so it creates the number-of-vectorized iterations > > > > > > check and branch around the main (highest VF) vectorized loop. > > > > > > > > > > > > > > > > Hmm I think I may have skimmed over this earlier. I am reading it now > > > > > and am > > > > > not entirely sure what you mean by this. Specifically the "number of > > > > > vectorized iterations" or how a prologue loop plays a role in this. > > > > > > > > When there is no prologue then the versioning condition currently > > > > ensures we always enter the vector loop. Only if there is a prologue > > > > is there a check and branch eventually skipping right to the epilogue. > > > > If the versioning condition (now using a lower VF) doesn't guarantee > > > > this we have to add this jump-around. > > > > > > Right, I haven't looked at the prologue path yet. I had a quick look and > > > can't find where this branch skipping to the epilogue is constructed. I > > > will take a better look after I got my current example to work. > > > > > > > > > > > > > I guess this sheds some light on the comment above. And it definitely > > > > > implies > > > > > we would need to know the lowest VF when creating this condition. > > > > > Which is > > > > > tricky. > > > > > > > > We can simply use the smallest vector size supported by the target to > > > > derive it from the actual VF, no? > > > > > > So I could wait to introduce this check after all epilogue vectorization > > > is done, back track to the last niter check and duplicate that in the > > > outer loop. > > > > > > What I didn't want to do was use the smallest possible vector size for the > > > target because I was under the impression that does not necessarily > > > correspond to the smallest VF we may have for a loop, as the vectorizer > > > may have decided not to vectorize for that vector size because of costs? > > > If it I can assume this never happens, that once it starts to vectorize > > > epilogues that it will keep vectorizing them for any vector size it knows > > > off then yeah I can use that. > > > > > > > > > > I'm not sure I understand - why would you have any check not inside > > > > the outer loop? Yes, we now eventually hoist versioning checks > > > > but the VF checks for the individual variants should be around > > > > the vectorized loop itself (so not really part of the versioning check). > > > > > > Yeah I agree. I was just explaining what I had done wrong now. > > > > > > > > > Cheers, > > > > > Andre > > > > > > > > > > PS: I often find myself having to patch the DOMINATOR information, > > > > > sometimes > > > > > its easy to, but sometimes it can get pretty complicated. I wonder > > > > > whether it > > > > > would make sense to write something that traverses a loop and corrects > > > > > this, > > > > > if it doesn't exist already. > > > > > > > > There's iterate-fix-dominators, but unless you create new edges/blocks > > > > manually rather than doing split-block/redirect-edge which should do > > > > dominator updating for you. > > > > > > Ah I was doing everything manually after having some bad experiences with > > > lv_add_condition_to_bb. I will have a look at those thanks! > > > > > > Cheers, > > > Andre > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > > > > > > -- Richard Biener <rguent...@suse.de> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)