On 2023/01/08 6:53, Max Filippov wrote: > On Fri, Jan 6, 2023 at 6:55 PM Takayuki 'January June' Suwa > <jjsuwa_sys3...@yahoo.co.jp> wrote: >> >> This patch optimizes the operation of cutting and splicing two register >> values at a specified bit position, in other words, combining (bitwise >> ORing) bits 0 through (C-1) of the register with bits C through 31 >> of the other, where C is the specified immediate integer 1 through 31. >> >> This typically applies to signedness copy of floating point number or >> __builtin_return_address() if the windowed register ABI, and saves one >> instruction compared to four shifts and a bitwise OR by the RTL >> generation pass. > > While I indeed see this kind of change, e.g.: > - extui a3, a3, 27, 5 > - slli a2, a2, 5 > - srli a2, a2, 5 > - slli a3, a3, 27 > - or a2, a2, a3 > + slli a2, a2, 5 > + extui a3, a3, 27, 5 > + ssai 5 > + src a2, a3, a2 > > I also see the following: > - movi.n a6, -4 > - and a5, a5, a6 > - extui a3, a3, 0, 2 > - or a3, a3, a5 > + srli a5, a5, 2 > + slli a3, a3, 30 > + ssai 30 > + src a3, a5, a3 > > i.e. after the split there's the same number of instructions, > but the new sequence is one byte longer than the original one > because of the movi.n. > > Looking at a bunch of linux builds I observe a slight code size > growth in call0 kernels and a slight code size reduction in > windowed kernels. > >> gcc/ChangeLog: >> >> * config/xtensa/xtensa.md (*splice_bits): >> New insn_and_split pattern. >> --- >> gcc/config/xtensa/xtensa.md | 47 +++++++++++++++++++++++++++++++++++++ >> 1 file changed, 47 insertions(+) >> >> diff --git a/gcc/config/xtensa/xtensa.md b/gcc/config/xtensa/xtensa.md >> index 0a26d3dccf4..36ec1b1918e 100644 >> --- a/gcc/config/xtensa/xtensa.md >> +++ b/gcc/config/xtensa/xtensa.md >> @@ -746,6 +746,53 @@ >> (set_attr "mode" "SI") >> (set_attr "length" "3")]) >> >> +(define_insn_and_split "*splice_bits" >> + [(set (match_operand:SI 0 "register_operand" "=a") >> + (ior:SI (and:SI (match_operand:SI 1 "register_operand" "r") >> + (match_operand:SI 3 "const_int_operand" "i")) >> + (and:SI (match_operand:SI 2 "register_operand" "r") >> + (match_operand:SI 4 "const_int_operand" "i"))))] >> + >> + "!optimize_debug && optimize >> + && INTVAL (operands[3]) + INTVAL (operands[4]) == -1 >> + && (exact_log2 (INTVAL (operands[3]) + 1) > 0 >> + || exact_log2 (INTVAL (operands[4]) + 1) > 0)" >> + "#" >> + "&& can_create_pseudo_p ()" >> + [(set (match_dup 5) >> + (ashift:SI (match_dup 1) >> + (match_dup 4))) >> + (set (match_dup 6) >> + (lshiftrt:SI (match_dup 2) >> + (match_dup 3))) >> + (set (match_dup 0) >> + (ior:SI (lshiftrt:SI (match_dup 5) >> + (match_dup 4)) >> + (ashift:SI (match_dup 6) >> + (match_dup 3))))] >> +{ >> + int shift; >> + if (INTVAL (operands[3]) < 0) >> + { >> + rtx x; >> + x = operands[1], operands[1] = operands[2], operands[2] = x; >> + x = operands[3], operands[3] = operands[4], operands[4] = x; >> + } >> + shift = floor_log2 (INTVAL (operands[3]) + 1); >> + operands[3] = GEN_INT (shift); >> + operands[4] = GEN_INT (32 - shift); >> + operands[5] = gen_reg_rtx (SImode); >> + operands[6] = gen_reg_rtx (SImode); >> +} >> + [(set_attr "type" "arith") >> + (set_attr "mode" "SI") >> + (set (attr "length") >> + (if_then_else (match_test "TARGET_DENSITY >> + && (INTVAL (operands[3]) == 0x7FFFFFFF >> + || INTVAL (operands[4]) == >> 0x7FFFFFFF)") >> + (const_int 11) >> + (const_int 12)))]) > > I wonder how the length could be 11 here? I always see 4 3-byte > instructions generated by this pattern. >
Sorry, I should have carried out a systematic test beforehand: #define TEST(c) \ unsigned int test_ ## c (unsigned int a, unsigned int b) { \ return (a & (-1U >> c)) | (b & ~(-1U >> c)); \ } TEST(1) TEST(2) ... TEST(30) TEST(31) Without this patch, compiling the above if c is: a. between 1 and 15, slli (or add.n) + extui + slli + srli + or b. 16 then extui + slli + extui + or c. between 17 and 20, srli + slli + extui + or d. between 21 and 31, movi(.n) + and + extui + or Clearly, the patch should be restricted to apply only to case a.