在 2025/5/30 1:46, Jeff Law 写道:
On 5/28/25 9:05 PM, Jiawei wrote:
This seems like it would be much better as a combine pattern. In
fact, I'm a bit surprised that combine didn't simplify this series
of operations into a IOR. So I'd really like to see the .combine
dump with and without this hunk for the relevant testcase.
Here is the dump log, using
trunk(7fca794e0199baff8f07140a950ba3374c6aa634), more details please
see https://godbolt.org/z/3hfzdz3Ks
===========================================================================================
~/rv/bin/riscv64-unknown-linux-gnu-g++ -march=rv64gc_zba_zbb_zbs -O2
-S -fdump-rtl-all redundant-bitmap-2.C
before combine in .ext_dce
Thanks! That was helpful.
I was looking for the full .combine dump -- the full dump includes
information about patterns that were tried and failed. That often
will point the way to a better solution.
In particular in the .combine dump we have this nugget:
Trying 15 -> 18:
15: r151:DI=0xfffffffffffffffe<-<r156:DI#0&r149:DI
REG_DEAD r149:DI
18: r154:DI=0x1<<r156:DI#0^r151:DI
REG_DEAD r156:DI
REG_DEAD r151:DI
Failed to match this instruction:
(set (reg:DI 154)
(xor:DI (and:DI (rotate:DI (const_int -2 [0xfffffffffffffffe])
(subreg:QI (reg:DI 156 [ b ]) 0))
(reg:DI 149 [ *a_10(D) ]))
(ashift:DI (const_int 1 [0x1])
(subreg:QI (reg:DI 156 [ b ]) 0))))
I think combine really should have simplified that before querying the
target. That really should have been simpified to a bit insertion
idiom or perhaps an simpler ior.
More generally, the question we should first ask is whether or not the
source should have simplified independent of the target. I think the
answer is yes in this case, which means we should try to fix that
problem first since it'll improve every target rather than just RISC-V.
When we do find ourselves needing to write new target patterns, a
define_insn will generally be preferable to a define_peephole.
The define_insn will match when there's a data dependency within a
basic block. A define_peephole requires the insns to be consecutive
in the IL. Thus the define_insn will tend to match more often and is
those preferable to a define_peephole.
Anyway, to recap, I think the better solution is to improve
simplify_binary_operation or one of its children or perhaps
simplify_compound_operation its related functions.
jeff h
Hi Jeff,
Thanks for your suggestions, I found that the ior is successful
generated in combine pass without using the zbs extension, and in other
architecture it also work fine.
(note1513162NOTE_INSN_DELETED)
(insn1615192(set(reg:DI149[ *a_10(D) ])
(zero_extend:DI(mem:QI(reg/v/f:DI142[ a ]) [0*a_10(D)+0S1A8])))
"/app/example.cpp":3:7126{*zero_extendqidi2_internal}
(nil))
(note1916222NOTE_INSN_DELETED)
(insn2219232(set(reg:DI154)
(ior:DI(reg:DI149[ *a_10(D) ])
(reg:DI134[ _1 ]))) "/app/example.cpp":4:7109{*iordi3}
(expr_list:REG_DEAD(reg:DI149[ *a_10(D) ])
(expr_list:REG_DEAD(reg:DI134[ _1 ])
(nil))))
And I also tried some tweaks in simplify-rtx.cc, but seem they do not
affected the generation of bclr and binv, so I think it's just related
to the zbs part.
When I enabled zbs extension, the bclr and binv instruction seems to
have a higher priority. They generated before bset since they have the
same cost.
When I tried adjust bclr cost into 2, the combine pass worked fine and
generate the bset instruction.
Looking forward to your comments,
Jiawei