On 15-11-14 13:14, Tom de Vries wrote:
Hi,

I'm submitting a patch series with initial support for the oacc kernels 
directive.

The patch series uses pass_parallelize_loops to implement parallelization of
loops in the oacc kernels region.

The patch series consists of these 8 patches:
...
     1  Expand oacc kernels after pass_build_ealias
     2  Add pass_oacc_kernels
     3  Add pass_ch_oacc_kernels to pass_oacc_kernels
     4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
     5  Add pass_loop_im to pass_oacc_kernels
     6  Add pass_ccp to pass_oacc_kernels
     7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
     8  Do simple omp lowering for no address taken var
...

This patch moves omp expansion of the oacc kernels directive to after pass_build_ealias.

The rationale is that in order to use pass_parallelize_loops for analysis and transformation of an oacc kernels region, we postpone omp expansion of that region until the earliest point in the pass list where enough information is availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.

The patch postpones expansion in expand_omp, and ensures expansion by adding pass_expand_omp_ssa:
- after pass_build_ealias, and
- after pass_all_early_optimizations for the case we're not optimizing.

In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa, the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of lowered omp code, to handle it conservatively.

The patch contains changes in expand_omp_target to deal with ssa-code, similar to what is already present in expand_omp_taskreg.

Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be static for oacc kernels. It does this to get some references to .omp_data_sizes and .omp_data_kinds in the ssa code. Without these references, the definitions will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE kludge for this purpose ].

Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the original function of which the definition has been removed (as in moved to the split off function). TODO_remove_unused_locals takes care of some of them, but not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these dangling SSA_NAMEs and releases them.

OK for trunk?

Thanks,
- Tom
2014-11-14  Tom de Vries  <t...@codesourcery.com>

	* function.h (struct function): Add contains_oacc_kernels field.
	* gimplify.c (gimplify_omp_workshare): Set contains_oacc_kernels.
	* omp-low.c: Include gimple-pretty-print.h.
	(release_first_vuse_in_edge_dest): New function.
	(expand_omp_target): Handle ssa-code.
	(expand_omp): Don't expand GIMPLE_OACC_KERNELS when not in ssa.
	(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
	properties_provided field.
	(pass_expand_omp::execute): Set PROP_gimple_eomp in
	cfun->curr_properties only if cfun does not contain oacc kernels.
	(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
	todo_flags_finish field.
	(pass_expand_omp_ssa::execute): Release dandging SSA_NAMEs after calling
	execute_expand_omp.
	(lower_omp_target): Add static_arrays variable, init to 1.  Don't use
	static arrays for kernels directive.  Use static_arrays variable.
	Handle case that .omp_data_kinds is not static.
	(gimple_stmt_omp_lowering_p): New function.
	* omp-low.h (gimple_stmt_omp_lowering_p): Declare.
	* passes.def: Add pass_expand_omp_ssa after pass_build_ealias.
	* tree-ssa-ccp.c: Include omp-low.h.
	(surely_varying_stmt_p): Handle omp lowering code conservatively.
	* tree-ssa-forwprop.c: Include omp-low.h.
	(pass_forwprop::execute): Handle omp lowering code conservatively.
---
 gcc/function.h          |   3 +
 gcc/gimplify.c          |   1 +
 gcc/omp-low.c           | 194 +++++++++++++++++++++++++++++++++++++++++++++---
 gcc/omp-low.h           |   1 +
 gcc/passes.def          |   2 +
 gcc/tree-ssa-ccp.c      |   4 +
 gcc/tree-ssa-forwprop.c |   4 +-
 7 files changed, 196 insertions(+), 13 deletions(-)

diff --git a/gcc/function.h b/gcc/function.h
index 08ab761..a72c154 100644
--- a/gcc/function.h
+++ b/gcc/function.h
@@ -664,6 +664,9 @@ struct GTY(()) function {
 
   /* Set when the tail call has been identified.  */
   unsigned int tail_call_marked : 1;
+
+  /* Set when the function contains oacc kernels directives.  */
+  unsigned int contains_oacc_kernels : 1;
 };
 
 /* Add the decl D to the local_decls list of FUN.  */
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 2c8c666..52d7e6d 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -7281,6 +7281,7 @@ gimplify_omp_workshare (tree *expr_p, gimple_seq *pre_p)
       break;
     case OACC_KERNELS:
       stmt = gimple_build_oacc_kernels (body, OACC_KERNELS_CLAUSES (expr));
+      cfun->contains_oacc_kernels = 1;
       break;
     case OACC_PARALLEL:
       stmt = gimple_build_oacc_parallel (body, OACC_PARALLEL_CLAUSES (expr));
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 187167a..6caeae9 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -87,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-eh.h"
 #include "cilk.h"
 #include "lto-section-names.h"
+#include "gimple-pretty-print.h"
 
 
 /* Lowering of OpenMP parallel and workshare constructs proceeds in two
@@ -5337,6 +5338,35 @@ expand_omp_build_assign (gimple_stmt_iterator *gsi_p, tree to, tree from)
     }
 }
 
+static void
+release_first_vuse_in_edge_dest (edge e)
+{
+  gimple_stmt_iterator i;
+  basic_block bb = e->dest;
+
+  for (i = gsi_start_phis (bb); !gsi_end_p (i); gsi_next (&i))
+    {
+      gimple phi = gsi_stmt (i);
+      tree arg = PHI_ARG_DEF_FROM_EDGE (phi, e);
+
+      if (!virtual_operand_p (arg))
+	continue;
+
+      mark_virtual_operand_for_renaming (arg);
+      return;
+    }
+
+  for (i = gsi_start_bb (bb); !gsi_end_p (i); gsi_next_nondebug (&i))
+    {
+      gimple stmt = gsi_stmt (i);
+      if (gimple_vuse (stmt) == NULL_TREE)
+	continue;
+
+      mark_virtual_operand_for_renaming (gimple_vuse (stmt));
+      return;
+    }
+}
+
 /* Expand the OpenMP parallel or task directive starting at REGION.  */
 
 static void
@@ -8831,7 +8861,6 @@ expand_omp_target (struct omp_region *region)
   /* Supported by expand_omp_taskreg, but not here.  */
   if (child_cfun != NULL)
     gcc_assert (!child_cfun->cfg);
-  gcc_assert (!gimple_in_ssa_p (cfun));
 
   entry_bb = region->entry;
   exit_bb = region->exit;
@@ -8857,7 +8886,7 @@ expand_omp_target (struct omp_region *region)
 	{
 	  basic_block entry_succ_bb = single_succ (entry_bb);
 	  gimple_stmt_iterator gsi;
-	  tree arg;
+	  tree arg, narg;
 	  gimple tgtcopy_stmt = NULL;
 	  tree sender = TREE_VEC_ELT (gimple_omp_data_arg (entry_stmt), 0);
 
@@ -8887,8 +8916,27 @@ expand_omp_target (struct omp_region *region)
 	  gcc_assert (tgtcopy_stmt != NULL);
 	  arg = DECL_ARGUMENTS (child_fn);
 
-	  gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
-	  gsi_remove (&gsi, true);
+	  if (!gimple_in_ssa_p (cfun))
+	    {
+	      gcc_assert (gimple_assign_lhs (tgtcopy_stmt) == arg);
+	      gsi_remove (&gsi, true);
+	    }
+	  else
+	    {
+	      gcc_assert (SSA_NAME_VAR (gimple_assign_lhs (tgtcopy_stmt))
+			  == arg);
+
+	      /* If we are in ssa form, we must load the value from the default
+		 definition of the argument.  That should not be defined now,
+		 since the argument is not used uninitialized.  */
+	      gcc_assert (ssa_default_def (cfun, arg) == NULL);
+	      narg = make_ssa_name (arg, gimple_build_nop ());
+	      set_ssa_default_def (cfun, arg, narg);
+	      /* ?? Is setting the subcode really necessary ??  */
+	      gimple_omp_set_subcode (tgtcopy_stmt, TREE_CODE (narg));
+	      gimple_assign_set_rhs1 (tgtcopy_stmt, narg);
+	      update_stmt (tgtcopy_stmt);
+	    }
 	}
 
       /* Declare local variables needed in CHILD_CFUN.  */
@@ -8931,11 +8979,23 @@ expand_omp_target (struct omp_region *region)
 	  stmt = gimple_build_return (NULL);
 	  gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
 	  gsi_remove (&gsi, true);
+
+	  /* A vuse in single_succ (exit_bb) may use a vdef from the region
+	     which is about to be split off.  Mark the vdef for renaming.  */
+	  release_first_vuse_in_edge_dest (single_succ_edge (exit_bb));
 	}
 
       /* Move the offloading region into CHILD_CFUN.  */
 
-      block = gimple_block (entry_stmt);
+      if (gimple_in_ssa_p (cfun))
+	{
+	  init_tree_ssa (child_cfun);
+	  init_ssa_operands (child_cfun);
+	  child_cfun->gimple_df->in_ssa_p = true;
+	  block = NULL_TREE;
+	}
+      else
+	block = gimple_block (entry_stmt);
 
       new_bb = move_sese_region_to_fn (child_cfun, entry_bb, exit_bb, block);
       if (exit_bb)
@@ -8985,6 +9045,8 @@ expand_omp_target (struct omp_region *region)
 	  if (changed)
 	    cleanup_tree_cfg ();
 	}
+      if (gimple_in_ssa_p (cfun))
+	update_ssa (TODO_update_ssa);
       pop_cfun ();
     }
 
@@ -9261,6 +9323,8 @@ expand_omp_target (struct omp_region *region)
       gcc_assert (g && gimple_code (g) == GIMPLE_OMP_RETURN);
       gsi_remove (&gsi, true);
     }
+  if (gimple_in_ssa_p (cfun))
+    update_ssa (TODO_update_ssa_only_virtuals);
 }
 
 
@@ -9331,6 +9395,15 @@ expand_omp (struct omp_region *region)
 	  break;
 
 	case GIMPLE_OACC_KERNELS:
+	  if (!gimple_in_ssa_p (cfun))
+	    /* We're in pass_expand_omp.  Postpone expanding till
+	       pass_expand_omp_ssa.  */
+	    break;
+
+	  /* We're in pass_expand_omp_ssa.  Expand now.  */
+
+	  /* FALLTHRU.  */
+
 	case GIMPLE_OACC_PARALLEL:
 	case GIMPLE_OMP_TARGET:
 	  expand_omp_target (region);
@@ -9503,7 +9576,7 @@ const pass_data pass_data_expand_omp =
   OPTGROUP_NONE, /* optinfo_flags */
   TV_NONE, /* tv_id */
   PROP_gimple_any, /* properties_required */
-  PROP_gimple_eomp, /* properties_provided */
+  0 /* Possibly PROP_gimple_eomp.  */, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
   0, /* todo_flags_finish */
@@ -9517,7 +9590,7 @@ public:
   {}
 
   /* opt_pass methods: */
-  virtual unsigned int execute (function *)
+  virtual unsigned int execute (function *fun)
     {
       bool gate = ((flag_openacc != 0 || flag_openmp != 0
 		    || flag_openmp_simd != 0 || flag_cilkplus != 0)
@@ -9528,7 +9601,12 @@ public:
       if (!gate)
 	return 0;
 
-      return execute_expand_omp ();
+      unsigned int res = execute_expand_omp ();
+
+      if (!cfun->contains_oacc_kernels)
+	fun->curr_properties |= PROP_gimple_eomp;
+
+      return res;
     }
 
 }; // class pass_expand_omp
@@ -9553,7 +9631,8 @@ const pass_data pass_data_expand_omp_ssa =
   PROP_gimple_eomp, /* properties_provided */
   0, /* properties_destroyed */
   0, /* todo_flags_start */
-  TODO_cleanup_cfg | TODO_rebuild_alias, /* todo_flags_finish */
+  TODO_cleanup_cfg | TODO_rebuild_alias
+  | TODO_remove_unused_locals, /* todo_flags_finish */
 };
 
 class pass_expand_omp_ssa : public gimple_opt_pass
@@ -9568,7 +9647,47 @@ public:
     {
       return !(fun->curr_properties & PROP_gimple_eomp);
     }
-  virtual unsigned int execute (function *) { return execute_expand_omp (); }
+  virtual unsigned int execute (function *)
+    {
+      unsigned res = execute_expand_omp ();
+
+      /* After running pass_expand_omp_ssa to expand the oacc kernels
+	 directive, we are left in the original function with anonymous
+	 SSA_NAMEs, with a defining statement that has been deleted.  This
+	 pass finds those SSA_NAMEs and releases them.  */
+      unsigned int i;
+      for (i = 1; i < num_ssa_names; ++i)
+	{
+	  tree name = ssa_name (i);
+	  if (name == NULL_TREE)
+	    continue;
+
+	  gimple stmt = SSA_NAME_DEF_STMT (name);
+	  bool found = false;
+
+	  ssa_op_iter op_iter;
+	  def_operand_p def_p;
+	  FOR_EACH_PHI_OR_STMT_DEF (def_p, stmt, op_iter, SSA_OP_ALL_DEFS)
+	    {
+	      tree def = DEF_FROM_PTR (def_p);
+	      if (def == name)
+		{
+		  found = true;
+		  break;
+		}
+	    }
+
+	  if (!found)
+	    {
+	      if (dump_file)
+		fprintf (dump_file, "Released dangling ssa name %u\n", i);
+	      release_ssa_name (name);
+	    }
+	}
+
+      return res;
+    }
+  opt_pass * clone () { return new pass_expand_omp_ssa (m_ctxt); }
 
 }; // class pass_expand_omp_ssa
 
@@ -11194,6 +11313,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
   unsigned int map_cnt = 0;
   tree (*gimple_omp_clauses) (const_gimple);
   void (*gimple_omp_set_data_arg) (gimple, tree);
+  unsigned int static_arrays = 1;
 
   offloaded = is_gimple_omp_offloaded (stmt);
   data_region = false;
@@ -11202,6 +11322,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
     case GIMPLE_OACC_KERNELS:
       gimple_omp_clauses = gimple_oacc_kernels_clauses;
       gimple_omp_set_data_arg = gimple_oacc_kernels_set_data_arg;
+      static_arrays = 0;
       break;
     case GIMPLE_OACC_PARALLEL:
       gimple_omp_clauses = gimple_oacc_parallel_clauses;
@@ -11368,7 +11489,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			  ".omp_data_sizes");
       DECL_NAMELESS (TREE_VEC_ELT (t, 1)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 1)) = 1;
-      TREE_STATIC (TREE_VEC_ELT (t, 1)) = 1;
+      TREE_STATIC (TREE_VEC_ELT (t, 1)) = static_arrays;
       tree tkind_type;
       int talign_shift;
       if (is_gimple_omp_oacc_specifically (stmt))
@@ -11386,7 +11507,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 			  ".omp_data_kinds");
       DECL_NAMELESS (TREE_VEC_ELT (t, 2)) = 1;
       TREE_ADDRESSABLE (TREE_VEC_ELT (t, 2)) = 1;
-      TREE_STATIC (TREE_VEC_ELT (t, 2)) = 1;
+      TREE_STATIC (TREE_VEC_ELT (t, 2)) = static_arrays;
       gimple_omp_set_data_arg (stmt, t);
 
       vec<constructor_elt, va_gc> *vsize;
@@ -11559,6 +11680,22 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 						    clobber));
 	}
 
+      if (!TREE_STATIC (TREE_VEC_ELT (t, 2)))
+	{
+	  gimple_seq initlist = NULL;
+	  force_gimple_operand (build1 (DECL_EXPR, void_type_node,
+					TREE_VEC_ELT (t, 2)),
+				&initlist, true, NULL_TREE);
+	  gimple_seq_add_seq (&ilist, initlist);
+
+	  tree clobber = build_constructor (TREE_TYPE (TREE_VEC_ELT (t, 2)),
+					    NULL);
+	  TREE_THIS_VOLATILE (clobber) = 1;
+	  gimple_seq_add_stmt (&olist,
+			       gimple_build_assign (TREE_VEC_ELT (t, 2),
+						    clobber));
+	}
+
       tree clobber = build_constructor (ctx->record_type, NULL);
       TREE_THIS_VOLATILE (clobber) = 1;
       gimple_seq_add_stmt (&olist, gimple_build_assign (ctx->sender_decl,
@@ -13739,4 +13876,37 @@ omp_finish_file (void)
     }
 }
 
+/* Return true if STMT is omp-lowered code.  */
+
+bool
+gimple_stmt_omp_lowering_p (gimple stmt)
+{
+  tree use;
+  ssa_op_iter iter;
+  const char *s;
+
+  FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE|SSA_OP_DEF)
+    {
+      if (SSA_NAME_IDENTIFIER (use) == NULL_TREE)
+	continue;
+      s = IDENTIFIER_POINTER (SSA_NAME_IDENTIFIER (use));
+
+      if (!(strcmp (".omp_data_i", s) == 0
+	    || strcmp (".omp_data_arr", s) == 0
+	    || strcmp (".omp_data_sizes", s) == 0
+	    || strcmp (".omp_data_kinds", s) == 0))
+	continue;
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Detected omp lowering code\n");
+	  print_gimple_stmt (dump_file, stmt, 0, dump_flags);
+	}
+
+      return true;
+    }
+
+  return false;
+}
+
 #include "gt-omp-low.h"
diff --git a/gcc/omp-low.h b/gcc/omp-low.h
index ac587d0..ff8a956 100644
--- a/gcc/omp-low.h
+++ b/gcc/omp-low.h
@@ -28,6 +28,7 @@ extern void free_omp_regions (void);
 extern tree omp_reduction_init (tree, tree);
 extern bool make_gimple_omp_edges (basic_block, struct omp_region **, int *);
 extern void omp_finish_file (void);
+extern bool gimple_stmt_omp_lowering_p (gimple);
 
 extern GTY(()) vec<tree, va_gc> *offload_funcs;
 extern GTY(()) vec<tree, va_gc> *offload_vars;
diff --git a/gcc/passes.def b/gcc/passes.def
index cfca4f1..bce8591 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -72,6 +72,7 @@ along with GCC; see the file COPYING3.  If not see
 	  /* pass_build_ealias is a dummy pass that ensures that we
 	     execute TODO_rebuild_alias at this point.  */
 	  NEXT_PASS (pass_build_ealias);
+	  NEXT_PASS (pass_expand_omp_ssa);
 	  NEXT_PASS (pass_fre);
 	  NEXT_PASS (pass_merge_phi);
 	  NEXT_PASS (pass_cd_dce);
@@ -86,6 +87,7 @@ along with GCC; see the file COPYING3.  If not see
 	     late.  */
           NEXT_PASS (pass_split_functions);
       POP_INSERT_PASSES ()
+      NEXT_PASS (pass_expand_omp_ssa);
       NEXT_PASS (pass_release_ssa_names);
       NEXT_PASS (pass_rebuild_cgraph_edges);
       NEXT_PASS (pass_inline_parameters);
diff --git a/gcc/tree-ssa-ccp.c b/gcc/tree-ssa-ccp.c
index 7fc5220..8d0d1b8 100644
--- a/gcc/tree-ssa-ccp.c
+++ b/gcc/tree-ssa-ccp.c
@@ -164,6 +164,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "wide-int-print.h"
 #include "builtins.h"
+#include "omp-low.h"
 
 
 /* Possible lattice values.  */
@@ -788,6 +789,9 @@ surely_varying_stmt_p (gimple stmt)
       && gimple_code (stmt) != GIMPLE_CALL)
     return true;
 
+  if (gimple_stmt_omp_lowering_p (stmt))
+    return true;
+
   return false;
 }
 
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index a5283a2..a8f0701 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -67,6 +67,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-cfgcleanup.h"
 #include "tree-into-ssa.h"
 #include "cfganal.h"
+#include "omp-low.h"
 
 /* This pass propagates the RHS of assignment statements into use
    sites of the LHS of the assignment.  It's basically a specialized
@@ -3675,7 +3676,8 @@ pass_forwprop::execute (function *fun)
 	  tree lhs, rhs;
 	  enum tree_code code;
 
-	  if (!is_gimple_assign (stmt))
+	  if (!is_gimple_assign (stmt)
+	      || gimple_stmt_omp_lowering_p (stmt))
 	    {
 	      gsi_next (&gsi);
 	      continue;
-- 
1.9.1





Reply via email to