On Tue, 1 Nov 2016, Yuri Rumyantsev wrote:

> Hi All,
> 
> I re-send all patches sent by Ilya earlier for review which support
> vectorization of loop epilogues and loops with low trip count. We
> assume that the only patch - vec-tails-07-combine-tail.patch - was not
> approved by Jeff.
> 
> I did re-base of all patches and performed bootstrapping and
> regression testing that did not show any new failures. Also all
> changes related to new vect_do_peeling algorithm have been changed
> accordingly.
> 
> Is it OK for trunk?

I would have prefered that the series up to -03-nomask-tails would
_only_ contain epilogue loop vectorization changes but unfortunately
the patchset is oddly separated.

I have a comment on that part nevertheless:

@@ -1608,7 +1614,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
-      || loop->inner)
+      || loop->inner
+      /* Required peeling was performed in prologue and
+        is not required for epilogue.  */
+      || LOOP_VINFO_EPILOGUE_P (loop_vinfo))
     do_peeling = false;

   if (do_peeling
@@ -1888,7 +1897,10 @@ vect_enhance_data_refs_alignment (loop_vec_info 
loop_vinfo)

   do_versioning =
        optimize_loop_nest_for_speed_p (loop)
-       && (!loop->inner); /* FORNOW */
+       && (!loop->inner) /* FORNOW */
+        /* Required versioning was performed for the
+          original loop and is not required for epilogue.  */
+       && !LOOP_VINFO_EPILOGUE_P (loop_vinfo);

   if (do_versioning)
     {

please do that check in the single caller of this function.

Otherwise I still dislike the new ->aux use and I believe that simply
passing down info from the processed parent would be _much_ cleaner.
That is, here (and avoid the FOR_EACH_LOOP change):

@@ -580,12 +586,21 @@ vectorize_loops (void)
            && dump_enabled_p ())
           dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, vect_location,
                            "loop vectorized\n");
-       vect_transform_loop (loop_vinfo);
+       new_loop = vect_transform_loop (loop_vinfo);
        num_vectorized_loops++;
        /* Now that the loop has been vectorized, allow it to be unrolled
           etc.  */
        loop->force_vectorize = false;

+       /* Add new loop to a processing queue.  To make it easier
+          to match loop and its epilogue vectorization in dumps
+          put new loop as the next loop to process.  */
+       if (new_loop)
+         {
+           loops.safe_insert (i + 1, new_loop->num);
+           vect_loops_num = number_of_loops (cfun);
+         }

simply dispatch to a vectorize_epilogue (loop_vinfo, new_loop)
function which will set up stuff properly (and also perform
the if-conversion of the epilogue there).

That said, if we can get in non-masked epilogue vectorization
separately that would be great.

I'm still torn about all the rest of the stuff and question its
usability (esp. merging the epilogue with the main vector loop).
But it has already been approved ... oh well.

Thanks,
Richard.

Reply via email to