On 06/18/2015 08:32 AM, Yuri Rumyantsev wrote:
Richard,

Here is updated patch which does not include your proposal related to
the target hook deletion.
You wrote:
I still don't understand why you need the new target hook.  If we have a masked
load/store then the mask is computed by an assignment with a VEC_COND_EXPR
(in your example) and thus a test for a zero mask is expressible as just

  > if (vect__ifc__41.17_167 == { 0, 0, 0, 0... })

or am I missing something?
Such vector compare produces vector and does not set up cc flags
required for conditional branch (GIMPLE_COND).
If we use such comparison for GIMPLE_COND we got error message, so I
used target hook which does set up cc flags aka ptest instruction and
I left this part.
I think we need to look for something better. I really don't like the idea of using a target hook like this within the gimple optimizers unless it's absolutely necessary.

How are conditionals on vectors usually handled? You should try to mimick that and report, with detail, why that's not working.

I'm also not happy with the mechanisms to determine whether or not we should make this transformation. I'm willing to hash through other details and come back to this issue once we've got some of the other stuff figured out. I guess a new flag with the target adjusting is the fallback if we can't come up with some reasonable way to select this on or off.

The reason I don't like having the target files make this kind of decision is it makes more gimple transformations target dependent. Based on the history with RTL, that ultimately leads to an unwieldy mess.

And yes, I know gimple isn't 100% target independent -- but that doesn't mean we should keep adding more target dependencies. Each one we add needs to be well vetted.


patch.3

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 44a8624..e90de32 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -100,6 +100,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-iterator.h"
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
+#include "stringpool.h"
+#include "tree-ssanames.h"
So ideally we'll figure out why you're unable to generate a suitable conditional at the gimple level and the need for the x86 backend to generate the vector zero test will go away. And if it does go away, we want those #includes to disappear.

Additionally, instead of including stringpool.h & tree-ssanames.h, include "ssa.h" -- as a general rule.


 static rtx legitimize_dllimport_symbol (rtx, bool);
 static rtx legitimize_pe_coff_extern_decl (rtx, bool);
@@ -41100,6 +41102,47 @@ ix86_vectorize_builtin_gather (const_tree mem_vectype,
   return ix86_get_builtin (code);
 }

+/* Returns true if SOURCE type is supported by builtin ptest.
+   NAME is lhs of created ptest call.  All created statements are added
+   to GS.  */
+
+static bool
+ix86_vectorize_build_zero_vector_test (tree source, tree name,

Given the stated goal of not doing this in the target files, I'm not going to actually look at this routine or any of the infrastructure for this target hook.


diff --git a/gcc/params.def b/gcc/params.def
index 3e4ba3a..9e8de11 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1069,6 +1069,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
           "Maximum number of conditional store pairs that can be sunk",
           2, 0, 0)

+/* Enable inserion test on zero mask for masked stores if non-zero.  */
s/inserion/insertion/

+DEFPARAM (PARAM_ZERO_TEST_FOR_STORE_MASK,
+         "zero-test-for-store-mask",
+         "Enable insertion of test on zero mask for masked stores",
+         1, 0, 1)
I'm resisting the temptation to bike-shed... I don't like the name, but I don't have a better one yet. Similarly for the short description.


+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
cleanup-tree-dump is no longer needed.


diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 7ba0d8f..e31479b 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2072,6 +2072,7 @@ vectorizable_mask_load_store (gimple stmt, gimple_stmt_iterator *gsi,
     {
       tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
       prev_stmt_info = NULL;
+      LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
       for (i = 0; i < ncopies; i++)
If we need to keep this field, can you initialize it just after the STMT_VINFO_GATHER_P test after the /** Transform **/ comment. It seems kind-of buried and hard to find in this location.

I'm not really sure why Richi has objected to the field. Yea we can re-analyze stuff in the vectorizer, but we'd be repeating a fair number of tests (presumably we'd refactor the start of vectorizable_mask_load_store to avoid duplication). That seems more wasteful than another field.



        {
          unsigned align, misalign;
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index ff46a52..debb351 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -92,7 +92,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "gimple-fold.h"
 #include "tree-scalar-evolution.h"
-
+#include "stringpool.h"
+#include "tree-ssanames.h"
In general, rather than including stringpool.h and tree-ssanames.h, just include "ssa.h".



 /* Loop or bb location.  */
 source_location vect_location;
@@ -422,6 +423,184 @@ set_uid_loop_bbs (loop_vec_info loop_vinfo, gimple loop_vectorized_call)
   free (bbs);
 }

+/* Helper for optimize_mask_stores: returns true if STMT sinking to end
+   of BB is valid and false otherwise.  */
+
+static bool
+is_valid_sink (gimple stmt, gimple last_store)
+{
+  tree vdef;
+  imm_use_iterator imm_it;
+  gimple use_stmt;
+  basic_block bb = gimple_bb (stmt);
+
+  if ((vdef = gimple_vdef (stmt)))
+    {
+      int cross = 0;
+      /* Check that ther are no store vuses in current bb.  */
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_it, vdef)
+       if (gimple_bb (use_stmt) == bb)
+         cross++;
Seems to me that you should get rid of "cross" and just "return false" here. There's no need to keep looking at the immediate uses once you've found one in the same block. That also allows you to simplify this following code ...

+
+      if (cross == 0)
+       return true;
Eliminate the conditional and return true here.


+    }
+  else if (gimple_vuse (stmt) == NULL_TREE)
+    return true;
+  else if (gimple_vuse (stmt) == gimple_vuse (last_store))
+    return true;
+  return false;
+}
+
+/* The code below is trying to perform simple optimization - do not execute
+   masked store statement if its mask is zero mask since loads that follow
+   a masked store can be blocked.
+   It put all masked stores statements with the same mask into the new bb
+   with a check on zero mask.  */
I suspect you need to re-wrap the comment to 80 columns.


+
+  /* Loop has masked stores.  */
+  while (!worklist.is_empty ())
+    {
+      gimple last, def_stmt, last_store;
+      edge e, efalse;
+      tree mask, val;
+      basic_block store_bb, join_bb;
+      gimple_stmt_iterator gsi_to;
+      gimple_seq gs;
+      tree arg3;
+      tree vdef, new_vdef;
+      gphi *phi;
+      bool first_dump;
+
+      last = worklist.pop ();
+      mask = gimple_call_arg (last, 2);
Are there ever cases where the mask is a compile-time constant? I wouldn't think so, but I'm totally unfamiliar with all the tree-vector code.


+      val = make_ssa_name (integer_type_node);
+      gs = NULL;
+      /* Escape if target does not support test on zero mask.  */
+
+      if (!targetm.vectorize.build_zero_vector_test (mask, val, &gs))
+       {
+         if (dump_enabled_p ())
+           dump_printf (MSG_NOTE, "Target does not support ptest!\n");
+         return;
+       }
+      gcc_assert (gs != NULL);
As noted earlier, I think we need to revisit this and look for a way to do this test without calling into bits in the target files. I'd like to see the gimple you tried to generate and any error messages that were spit out when you did so.




+
+      /* Create new bb.  */
+      e = split_block (bb, last);
+      join_bb = e->dest;
+      store_bb = create_empty_bb (bb);
+      add_bb_to_loop (store_bb, loop);
When we split BB via split_block, is the newly created block (JOIN_BB) automatically added to the loop?





+      e->flags = EDGE_TRUE_VALUE;
+      efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
+      /* Don't want to put STORE_BB to unlikely part and use
+        50% probability.  */
+      store_bb->frequency = bb->frequency / 2;
+      efalse->probability = REG_BR_PROB_BASE / 2;
+      make_edge (store_bb, join_bb, EDGE_FALLTHRU);
Like Richi, I'd expect that in practice STORE_BB will be the more likely taken, how much so, I don't know, but 50-50 is probably not right. I'd think this would affect the ultimate block layout to some degree as well. Do we want the straightline path to include STORE_BB (which implies a branch-around STORE_BB and inverting the test from what you're doing now).




+      /* Create new PHI node for vdef of the last masked store:
+        .MEM_2 = VDEF <.MEM_1>
+        will be converted to
+        .MEM.3 = VDEF <.MEM_1>
+        and new PHI node will be created in join bb
+        .MEM_2 = PHI <.MEM_1, .MEM_3>
+      */
+      vdef = gimple_vdef (last);
+      gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);
+      new_vdef = make_ssa_name (gimple_vop (cfun), last);
+      phi = create_phi_node (vdef, join_bb);
+ add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), UNKNOWN_LOCATION);
+      gimple_set_vdef (last, new_vdef);
+      first_dump = true;
Presumably you don't add the phi arg associated with the edge BB->JOIN_BB because you don't know it until the end of the loop below, right?


Overall I think this is pretty close. I'd like you to focus on getting the test for a zero store mask sorted out.

Jeff


Reply via email to