When zero extracting a single bit bitfield from bits 16..31 on the H8 we currently generate some pretty bad code.

The fundamental issue is we can't shift efficiently and there's no trivial way to extract a single bit out of the high half word of an SImode value.

What usually happens is we use a synthesized right shift to get the single bit into the desired position, then a bit-and to mask off everything we don't care about.

The shifts are expensive, even using tricks like half and quarter word moves to implement shift-by-16 and shift-by-8. Additionally a logical right shift must clear out the upper bits which is redundant since we're going to mask things with &1 later.

This patch provides a consistently better sequence for such extractions. The general form moves the high half into the low half, a bit extraction into C, clear the destination, then move C into the destination with a few special cases.

This also avoids all the shenanigans for H8/SX which has a much more capable shifter. It's not single cycle, but it is reasonably efficient.

This has been regression tested on the H8 without issues. Pushing to the trunk momentarily.

jeff

ps. Yes, supporting extraction of multi-bit fields might be improvable as well. But I've already spent more time on this than I can reasonably justify.

commit 57dbc02d261bb833f6ef287187eb144321dd595c
Author: Jeff Law <j...@ventanamicro.com>
Date:   Thu Nov 9 17:34:01 2023 -0700

    [committed] Improve single bit zero extraction on H8.
    
    When zero extracting a single bit bitfield from bits 16..31 on the H8 we
    currently generate some pretty bad code.
    
    The fundamental issue is we can't shift efficiently and there's no trivial 
way
    to extract a single bit out of the high half word of an SImode value.
    
    What usually happens is we use a synthesized right shift to get the single 
bit
    into the desired position, then a bit-and to mask off everything we don't 
care
    about.
    
    The shifts are expensive, even using tricks like half and quarter word 
moves to
    implement shift-by-16 and shift-by-8.  Additionally a logical right shift 
must
    clear out the upper bits which is redundant since we're going to mask things
    with &1 later.
    
    This patch provides a consistently better sequence for such extractions.  
The
    general form moves the high half into the low half, a bit extraction into C,
    clear the destination, then move C into the destination with a few special
    cases.
    
    This also avoids all the shenanigans for H8/SX which has a much more capable
    shifter.  It's not single cycle, but it is reasonably efficient.
    
    This has been regression tested on the H8 without issues.  Pushing to the 
trunk
    momentarily.
    
    jeff
    
    ps.  Yes, supporting zero extraction of multi-bit fields might be 
improvable as
    well.  But I've already spent more time on this than I can reasonably 
justify.
    
    gcc/
            * config/h8300/combiner.md (single bit sign_extract): Avoid recently
            added patterns for H8/SX.
            (single bit zero_extract): New patterns.

diff --git a/gcc/config/h8300/combiner.md b/gcc/config/h8300/combiner.md
index 2f7faf77c93..e1179b5fea6 100644
--- a/gcc/config/h8300/combiner.md
+++ b/gcc/config/h8300/combiner.md
@@ -1278,7 +1278,7 @@ (define_insn_and_split ""
        (sign_extract:SI (match_operand:QHSI 1 "register_operand" "0")
                         (const_int 1)
                         (match_operand 2 "immediate_operand")))]
-  ""
+  "!TARGET_H8300SX"
   "#"
   "&& reload_completed"
   [(parallel [(set (match_dup 0)
@@ -1291,7 +1291,7 @@ (define_insn ""
                         (const_int 1)
                         (match_operand 2 "immediate_operand")))
    (clobber (reg:CC CC_REG))]
-  ""
+  "!TARGET_H8300SX"
 {
   int position = INTVAL (operands[2]);
 
@@ -1359,3 +1359,69 @@ (define_insn ""
   return "subx\t%s0,%s0\;exts.w %T0\;exts.l %0";
 }
   [(set_attr "length" "10")])
+
+;; For shift counts >= 16 we can always do better than the
+;; generic sequences.  Other patterns handle smaller counts.
+(define_insn_and_split ""
+  [(set (match_operand:SI 0 "register_operand" "=r")
+       (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+                            (match_operand 2 "immediate_operand" "n"))
+               (const_int 1)))]
+  "!TARGET_H8300SX && INTVAL (operands[2]) >= 16"
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 0) (and:SI (lshiftrt:SI (match_dup 0) (match_dup 
2))
+                                        (const_int 1)))
+             (clobber (reg:CC CC_REG))])])
+
+(define_insn ""
+  [(set (match_operand:SI 0 "register_operand" "=r")
+       (and:SI (lshiftrt:SI (match_operand:SI 1 "register_operand" "0")
+                            (match_operand 2 "immediate_operand" "n"))
+               (const_int 1)))
+   (clobber (reg:CC CC_REG))]
+  "!TARGET_H8300SX && INTVAL (operands[2]) >= 16"
+{
+  int position = INTVAL (operands[2]);
+
+  /* If the bit we want is the highest bit we can just rotate it into position
+     and mask off everything else.  */
+  if (position == 31)
+    {
+      output_asm_insn ("rotl.l\t%0", operands);
+      return "and.l\t#1,%0";
+    }
+
+  /* Special case for H8/S.  Similar to bit 31.  */
+  if (position == 30 && TARGET_H8300S)
+    return "rotl.l\t#2,%0\;and.l\t#1,%0";
+
+  if (position <= 30 && position >= 17)
+    {
+      /* Shift 16 bits, without worrying about extensions.  */
+      output_asm_insn ("mov.w\t%e1,%f0", operands);
+
+      /* Get the bit we want into C.  */
+      operands[2] = GEN_INT (position % 8);
+      if (position >= 24)
+       output_asm_insn ("bld\t%2,%t0", operands);
+      else
+       output_asm_insn ("bld\t%2,%s0", operands);
+
+      /* xor + rotate to clear the destination, then rotate
+        the C into position.  */
+      return "xor.l\t%0,%0\;rotxl.l\t%0";
+    }
+
+  if (position == 16)
+    {
+      /* Shift 16 bits, without worrying about extensions.  */
+      output_asm_insn ("mov.w\t%e1,%f0", operands);
+
+      /* And finally, mask out everything we don't want.  */
+      return "and.l\t#1,%0";
+    }
+
+  gcc_unreachable ();
+}
+  [(set_attr "length" "10")])

Reply via email to