[RFC] SHIFT_COUNT_TRUNCATED and shift_truncation_mask

Robin Dapp Thu, 09 May 2019 04:53:33 -0700

Hi,

while trying to improve s390 code generation for rotate and shift I
noticed superfluous subregs for shift count operands. In our backend we
already have quite cumbersome patterns that would need to be duplicated
(or complicated further by more subst patterns) in order to get rid of
the subregs.


I had already finished all the patterns when I realized that
SHIFT_COUNT_TRUNCATED and the target hook shift_truncation_mask already
exist and could do what is needed without extra patterns.  Just defining
 shift_truncation_mask was not enough though as most of the additional
insns get introduced by combine.

Event SHIFT_COUNT_TRUNCATED is no perfect match to what our hardware
does because we always only consider the last 6 bits of a shift operand.

Despite all the warnings in the other backends, most notably
SHIFT_COUNT_TRUNCATED being "discouraged" as mentioned in riscv.h, I
wrote the attached tentative patch.  It's a little ad-hoc, uses the
SHIFT_COUNT_TRUNCATED paths only if shift_truncation_mask () != 0 and,
instead of truncating via & (GET_MODE_BITSIZE (mode) - 1), it applies
the mask returned by shift_truncation_mask.  Doing so, usage of both
"methods" actually reduces to a single way.

I assume both were originally intended for different purposes but
without knowing the history the separation seems artificial to me.  A
quick look at other backends showed that at least some (e.g. ARM) do not
use SHIFT_COUNT_TRUNCATED because the general behavior is not
fine-grained enough, e.g. the masks for shift and rotate differ.

While the attached patch might probably work for s390, it will probably
not for other targets.  In addition to what my patch does, would it be
useful to unify both truncation methods in a target hook that takes the
operation (shift, rotate, zero_extract, ...) as well as the mode as
arguments?  Thus, we would let the target decide what to do with the
specific combination of both.  Maybe this would also allow to
distinguish bit test operations from the rest.

--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -251,7 +251,7 @@ default_unwind_word_mode (void)
 /* The default implementation of TARGET_SHIFT_TRUNCATION_MASK.  */

 unsigned HOST_WIDE_INT
-default_shift_truncation_mask (machine_mode mode)
+default_shift_truncation_mask (enum rtx_code code, machine_mode mode)
 {
   [...]

Of course, when getting everything right in the backend, there will be
no difference in result, but in my experience it's easily possible to
forget a subreg/... somewhere and end up with worse code by accident.
Maybe there is another reason why SHIFT_COUNT_TRUNCATED is discouraged
that I missed entirely?

Regards
 Robin

diff --git a/gcc/combine.c b/gcc/combine.c
index 91e32c88c88..d2a659f929b 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -6445,14 +6445,12 @@ combine_simplify_rtx (rtx x, machine_mode op0_mode, int in_dest,
 	return simplify_shift_const (x, code, mode, XEXP (x, 0),
 				     INTVAL (XEXP (x, 1)));
 
-      else if (SHIFT_COUNT_TRUNCATED && !REG_P (XEXP (x, 1)))
+      else if (SHIFT_COUNT_TRUNCATED
+	  && targetm.shift_truncation_mask (mode)
+	  && !REG_P (XEXP (x, 1)))
 	SUBST (XEXP (x, 1),
 	       force_to_mode (XEXP (x, 1), GET_MODE (XEXP (x, 1)),
-			      (HOST_WIDE_INT_1U
-			       << exact_log2 (GET_MODE_UNIT_BITSIZE
-					      (GET_MODE (x))))
-			      - 1,
-			      0));
+		 targetm.shift_truncation_mask (mode), 0));
       break;
 
     default:
@@ -10594,8 +10592,8 @@ simplify_shift_const_1 (enum rtx_code code, machine_mode result_mode,
   /* Make sure and truncate the "natural" shift on the way in.  We don't
      want to do this inside the loop as it makes it more difficult to
      combine shifts.  */
-  if (SHIFT_COUNT_TRUNCATED)
-    orig_count &= GET_MODE_UNIT_BITSIZE (mode) - 1;
+  if (SHIFT_COUNT_TRUNCATED && targetm.shift_truncation_mask (mode))
+    orig_count &= targetm.shift_truncation_mask (mode);
 
   /* If we were given an invalid count, don't do anything except exactly
      what was requested.  */
@@ -12295,7 +12293,7 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
 	     between the position and the location of the single bit.  */
 	  /* Except we can't if SHIFT_COUNT_TRUNCATED is set, since we might
 	     have already reduced the shift count modulo the word size.  */
-	  if (!SHIFT_COUNT_TRUNCATED
+	  if ((!SHIFT_COUNT_TRUNCATED || !targetm.shift_truncation_mask (mode))
 	      && CONST_INT_P (XEXP (op0, 0))
 	      && XEXP (op0, 1) == const1_rtx
 	      && equality_comparison_p && const_op == 0
diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index ad8eacdf4dc..1d723f29e1e 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -2320,6 +2320,7 @@ s390_single_part (rtx op,
 	    part = i;
 	}
     }
+
   return part == -1 ? -1 : n_parts - 1 - part;
 }
 
@@ -2702,6 +2703,7 @@ s390_logical_operator_ok_p (rtx *operands)
   return true;
 }
 
+
 /* Narrow logical operation CODE of memory operand MEMOP with immediate
    operand IMMOP to switch from SS to SI type instructions.  */
 
@@ -16294,6 +16296,13 @@ s390_case_values_threshold (void)
   return default_case_values_threshold ();
 }
 
+static unsigned HOST_WIDE_INT
+s390_shift_truncation_mask (machine_mode mode)
+{
+  return (mode == DImode || mode == SImode
+      || mode == HImode || mode == QImode) ? 63 : 0;
+}
+
 /* Initialize GCC target structure.  */
 
 #undef  TARGET_ASM_ALIGNED_HI_OP
@@ -16585,6 +16594,9 @@ s390_case_values_threshold (void)
 #undef TARGET_CASE_VALUES_THRESHOLD
 #define TARGET_CASE_VALUES_THRESHOLD s390_case_values_threshold
 
+#undef TARGET_SHIFT_TRUNCATION_MASK
+#define TARGET_SHIFT_TRUNCATION_MASK s390_shift_truncation_mask
+
 /* Use only short displacement, since long displacement is not available for
    the floating point instructions.  */
 #undef TARGET_MAX_ANCHOR_OFFSET
diff --git a/gcc/config/s390/s390.h b/gcc/config/s390/s390.h
index 969f58a2ba0..d85bfcc5e04 100644
--- a/gcc/config/s390/s390.h
+++ b/gcc/config/s390/s390.h
@@ -1188,5 +1188,6 @@ struct GTY(()) machine_function
 
 #define TARGET_INDIRECT_BRANCH_TABLE s390_indirect_branch_table
 
+#define SHIFT_COUNT_TRUNCATED 1
 
 #endif /* S390_H */
diff --git a/gcc/cse.c b/gcc/cse.c
index 6c9cda16a98..f7bf287e9ef 100644
--- a/gcc/cse.c
+++ b/gcc/cse.c
@@ -3649,10 +3649,11 @@ fold_rtx (rtx x, rtx_insn *insn)
 		  && (INTVAL (const_arg1) >= GET_MODE_UNIT_PRECISION (mode)
 		      || INTVAL (const_arg1) < 0))
 		{
-		  if (SHIFT_COUNT_TRUNCATED)
+		  if (SHIFT_COUNT_TRUNCATED
+		      && targetm.shift_truncation_mask (mode))
 		    canon_const_arg1 = gen_int_shift_amount
 		      (mode, (INTVAL (const_arg1)
-			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
+			      & targetm.shift_truncation_mask (mode)));
 		  else
 		    break;
 		}
@@ -3698,10 +3699,11 @@ fold_rtx (rtx x, rtx_insn *insn)
 		  && (INTVAL (inner_const) >= GET_MODE_UNIT_PRECISION (mode)
 		      || INTVAL (inner_const) < 0))
 		{
-		  if (SHIFT_COUNT_TRUNCATED)
+		  if (SHIFT_COUNT_TRUNCATED
+		      && targetm.shift_truncation_mask (mode))
 		    inner_const = gen_int_shift_amount
 		      (mode, (INTVAL (inner_const)
-			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
+			      & targetm.shift_truncation_mask (mode)));
 		  else
 		    break;
 		}
diff --git a/gcc/expmed.c b/gcc/expmed.c
index d7f8e9a5d76..d8eebac5f08 100644
--- a/gcc/expmed.c
+++ b/gcc/expmed.c
@@ -2476,11 +2476,14 @@ expand_shift_1 (enum tree_code code, machine_mode mode, rtx shifted,
 	      (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
 	op1 = gen_int_shift_amount (mode,
 				    (unsigned HOST_WIDE_INT) INTVAL (op1)
-				    % GET_MODE_BITSIZE (scalar_mode));
+				    % targetm.shift_truncation_mask (scalar_mode) + 1);
       else if (GET_CODE (op1) == SUBREG
 	       && subreg_lowpart_p (op1)
 	       && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
-	       && SCALAR_INT_MODE_P (GET_MODE (op1)))
+	       && SCALAR_INT_MODE_P (GET_MODE (op1))
+	       && targetm.shift_truncation_mask (mode)
+		== (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode) - 1
+	       )
 	op1 = SUBREG_REG (op1);
     }
 
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 89a46a933fa..88064e4ac57 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -3525,9 +3525,10 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	  return lowpart_subreg (int_mode, tmp, inner_mode);
 	}
 
-      if (SHIFT_COUNT_TRUNCATED && CONST_INT_P (op1))
+      if (SHIFT_COUNT_TRUNCATED && targetm.shift_truncation_mask (mode)
+	  && CONST_INT_P (op1))
 	{
-	  val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
+	  val = INTVAL (op1) & targetm.shift_truncation_mask (mode);
 	  if (val != INTVAL (op1))
 	    return simplify_gen_binary (code, mode, op0,
 					gen_int_shift_amount (mode, val));
@@ -4347,8 +4348,9 @@ simplify_const_binary_operation (enum rtx_code code, machine_mode mode,
 	case ASHIFT:
 	  {
 	    wide_int wop1 = pop1;
-	    if (SHIFT_COUNT_TRUNCATED)
-	      wop1 = wi::umod_trunc (wop1, GET_MODE_PRECISION (int_mode));
+	    if (SHIFT_COUNT_TRUNCATED && targetm.shift_truncation_mask (int_mode))
+	      wop1 = wi::umod_trunc (wop1,
+		  targetm.shift_truncation_mask (int_mode) + 1);
 	    else if (wi::geu_p (wop1, GET_MODE_PRECISION (int_mode)))
 	      return NULL_RTX;
 
@@ -4426,8 +4428,9 @@ simplify_const_binary_operation (enum rtx_code code, machine_mode mode,
 	  if (CONST_SCALAR_INT_P (op1))
 	    {
 	      wide_int shift = rtx_mode_t (op1, mode);
-	      if (SHIFT_COUNT_TRUNCATED)
-		shift = wi::umod_trunc (shift, GET_MODE_PRECISION (int_mode));
+	      if (SHIFT_COUNT_TRUNCATED && targetm.shift_truncation_mask (int_mode))
+		shift = wi::umod_trunc (shift,
+		    targetm.shift_truncation_mask (int_mode) + 1);
 	      else if (wi::geu_p (shift, GET_MODE_PRECISION (int_mode)))
 		return NULL_RTX;
 	      result = wi::to_poly_wide (op0, mode) << shift;

[RFC] SHIFT_COUNT_TRUNCATED and shift_truncation_mask

Reply via email to