Although GCC should understand the limited range of clz/ctz/cls results,
Combine sometimes behaves oddly and duplicates ctz to remove a
sign extension.  Avoid this by adding an explicit AND with 127 in the
patterns. Deepsjeng performance improves by ~0.6%.

Bootstrap OK.

ChangeLog:
2020-02-03  Wilco Dijkstra  <wdijk...@arm.com>

        * config/aarch64/aarch64.md (clz<mode>2): Mask the clz result.
        (clrsb<mode>2): Likewise.
        (ctz<mode>2): Likewise.
--

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4794,7 +4794,8 @@ (define_insn 
"*and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse"
 
 (define_insn "clz<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-       (clz:GPI (match_operand:GPI 1 "register_operand" "r")))]
+       (and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r"))
+                (const_int 127)))]
   ""
   "clz\\t%<w>0, %<w>1"
   [(set_attr "type" "clz")]
@@ -4848,7 +4849,8 @@ (define_expand "popcount<mode>2"
 
 (define_insn "clrsb<mode>2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
-        (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
+       (and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r"))
+                (const_int 127)))]
   ""
   "cls\\t%<w>0, %<w>1"
   [(set_attr "type" "clz")]
@@ -4869,7 +4871,8 @@ (define_insn "rbit<mode>2"
 
 (define_insn_and_split "ctz<mode>2"
  [(set (match_operand:GPI           0 "register_operand" "=r")
-       (ctz:GPI (match_operand:GPI  1 "register_operand" "r")))]
+       (and:GPI (ctz:GPI (match_operand:GPI  1 "register_operand" "r"))
+               (const_int 127)))]
   ""
   "#"
   "reload_completed"


Reply via email to