在 2025/5/30 1:46, Jeff Law 写道:


On 5/28/25 9:05 PM, Jiawei wrote:


This seems like it would be much better as a combine pattern.   In fact, I'm a bit surprised that combine didn't simplify this series of operations into a IOR. So I'd really like to see the .combine dump with and without this hunk for the relevant testcase.

Here is the dump log, using trunk(7fca794e0199baff8f07140a950ba3374c6aa634), more details please see https://godbolt.org/z/3hfzdz3Ks

===========================================================================================

~/rv/bin/riscv64-unknown-linux-gnu-g++ -march=rv64gc_zba_zbb_zbs -O2  -S -fdump-rtl-all redundant-bitmap-2.C

before combine in .ext_dce
Thanks!   That was helpful.

I was looking for the full .combine dump -- the full dump includes information about patterns that were tried and failed.  That often will point the way to a better solution.

In particular in the .combine dump we have this nugget:

Trying 15 -> 18:
   15: r151:DI=0xfffffffffffffffe<-<r156:DI#0&r149:DI
      REG_DEAD r149:DI
   18: r154:DI=0x1<<r156:DI#0^r151:DI
      REG_DEAD r156:DI
      REG_DEAD r151:DI
Failed to match this instruction:
(set (reg:DI 154)
    (xor:DI (and:DI (rotate:DI (const_int -2 [0xfffffffffffffffe])
                (subreg:QI (reg:DI 156 [ b ]) 0))
            (reg:DI 149 [ *a_10(D) ]))
        (ashift:DI (const_int 1 [0x1])
            (subreg:QI (reg:DI 156 [ b ]) 0))))

I think combine really should have simplified that before querying the target.  That really should have been simpified to a bit insertion idiom or perhaps an simpler ior.

More generally, the question we should first ask is whether or not the source should have simplified independent of the target.  I think the answer is yes in this case, which means we should try to fix that problem first since it'll improve every target rather than just RISC-V.

When we do find ourselves needing to write new target patterns, a define_insn will generally be preferable to a define_peephole.

The define_insn will match when there's a data dependency within a basic block.  A define_peephole requires the insns to be consecutive in the IL.  Thus the define_insn will tend to match more often and is those preferable to a define_peephole.

Anyway, to recap, I think the better solution is to improve simplify_binary_operation or one of its children or perhaps simplify_compound_operation its related functions.

jeff h


Hi Jeff,


Thanks for your suggestions, I found that the ior is successful generated in combine pass without using the zbs extension, and in other architecture it also work fine.

(note1513162NOTE_INSN_DELETED)
(insn1615192(set(reg:DI149[ *a_10(D) ])
        (zero_extend:DI(mem:QI(reg/v/f:DI142[ a ]) [0*a_10(D)+0S1A8]))) "/app/example.cpp":3:7126{*zero_extendqidi2_internal}
     (nil))
(note1916222NOTE_INSN_DELETED)
(insn2219232(set(reg:DI154)
        (ior:DI(reg:DI149[ *a_10(D) ])
            (reg:DI134[ _1 ]))) "/app/example.cpp":4:7109{*iordi3}
     (expr_list:REG_DEAD(reg:DI149[ *a_10(D) ])
        (expr_list:REG_DEAD(reg:DI134[ _1 ])
            (nil))))

And I also tried some tweaks in simplify-rtx.cc, but seem they do not affected the generation of bclr and binv, so I think it's just related to the zbs part.

When I enabled zbs extension, the bclr and binv instruction seems to have a higher priority. They generated before bset since they have the same cost.

When I tried adjust bclr cost into 2, the combine pass worked fine and generate the bset instruction.


Looking forward to your comments,

Jiawei

Reply via email to