Although GCC should understand the limited range of clz/ctz/cls results, Combine sometimes behaves oddly and duplicates ctz to remove a sign extension. Avoid this by adding an explicit AND with 127 in the patterns. Deepsjeng performance improves by ~0.6%.
Bootstrap OK. ChangeLog: 2020-02-03 Wilco Dijkstra <wdijk...@arm.com> * config/aarch64/aarch64.md (clz<mode>2): Mask the clz result. (clrsb<mode>2): Likewise. (ctz<mode>2): Likewise. -- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5edc76ee14b55b2b4323530e10bd22b3ffca483e..7ff0536aac42957dbb7a15be766d35cc6725ac40 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -4794,7 +4794,8 @@ (define_insn "*and_one_cmpl_<SHIFT:optab><mode>3_compare0_no_reuse" (define_insn "clz<mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") - (clz:GPI (match_operand:GPI 1 "register_operand" "r")))] + (and:GPI (clz:GPI (match_operand:GPI 1 "register_operand" "r")) + (const_int 127)))] "" "clz\\t%<w>0, %<w>1" [(set_attr "type" "clz")] @@ -4848,7 +4849,8 @@ (define_expand "popcount<mode>2" (define_insn "clrsb<mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") - (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))] + (and:GPI (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")) + (const_int 127)))] "" "cls\\t%<w>0, %<w>1" [(set_attr "type" "clz")] @@ -4869,7 +4871,8 @@ (define_insn "rbit<mode>2" (define_insn_and_split "ctz<mode>2" [(set (match_operand:GPI 0 "register_operand" "=r") - (ctz:GPI (match_operand:GPI 1 "register_operand" "r")))] + (and:GPI (ctz:GPI (match_operand:GPI 1 "register_operand" "r")) + (const_int 127)))] "" "#" "reload_completed"