https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83008

--- Comment #29 from sergey.shalnov at intel dot com ---
Richard,
Thank you for your latest patch. I would like to clarify 
the multiple_p() function usage in if() clause.

First of all,  I assume that architectures with fixed 
size of HW registers (x86) should go to if(){} branch, 
but arch with unknown registers size should go to else{}.
This is because is_constant() function used.

The problem is in multiple_p() function. In the test case 
we have group_size=16 and const_nunits=8.
So, the multiple_p(16, 8) returns 1 and with 
–march=skylake-avx512 (or –march=znver1) we go to 
else{} branch which is not correct.

I used following change in your patch and test works as I expect.
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1923,7 +1923,7 @@ vect_analyze_slp_cost_1 (slp_instance instance, slp_tree
node,
          unsigned HOST_WIDE_INT const_nunits;
          unsigned nelt_limit;
          if (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits)
-             && ! multiple_p (group_size, const_nunits))
+             && multiple_p (group_size, const_nunits))
            {
              num_vects_to_check = ncopies_for_cost * const_nunits /
group_size;
              nelt_limit = const_nunits;
-- 

Other thing here is what we should do if group_size is, for example, 17. 
In this case (after my changes) wrong else{} branch will be taken.

I would propose to change this particular if() in following way.

          if (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits))
            {
            If(multiple_p (group_size, const_nunits))
              {
                num_vects_to_check = ncopies_for_cost * const_nunits /
group_size;
                nelt_limit = const_nunits;
              }
            else
              {
                ...not clear what we should have here...
              }
            }
          else
            {
              num_vects_to_check = 1;
              nelt_limit = group_size;
            }

Or perhaps multiple_p() should not be here at all?

Sergey

Reply via email to