在 2025/12/4 22:23, Richard Biener 写道:
On Thu, Dec 4, 2025 at 9:56 AM Dongyan Chen
<[email protected]> wrote:
Hi,
Following the previous discussion, I do some implemention in expr.cc,
This location allows access to tree-level type information while still
enabling queries to target-specific costs. However, I have some concerns
regarding the cost comparison logic.I am currently comparing the cost
of the multiplication directly against the sum of the decomposed
logical operations.
Does this cost heuristic seem reasonable to you?
Thanks and regards,
Dongyan
This patch implements an optimization to transform (a * b) == 0 to
(a == 0) || (b == 0) and (a * b != 0) to (a != 0) && (b != 0)
for signed and unsigned integer.
PR target/122935
+ machine_mode mode = TYPE_MODE (type);
+ rtx reg = gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1);
+ rtx mult_rtx = gen_rtx_MULT (mode, reg, reg);
+ int mult_cost = set_src_cost (mult_rtx, mode, speed_p);
+
+ int logic_cost = 0;
+ int cmp_cost = 0;
+ int logic_op_cost = 0;
+
+ if (comp_code == EQ_EXPR)
+ {
+ rtx eq_rtx = gen_rtx_EQ (mode, reg, const0_rtx);
+ cmp_cost = set_src_cost (eq_rtx, mode, speed_p);
+ rtx ior_rtx = gen_rtx_IOR (mode, reg, reg);
+ logic_op_cost = set_src_cost (ior_rtx, mode, speed_p);
+ logic_cost = 2 * cmp_cost + logic_op_cost;
+ }
+ else /* NE_EXPR */
+ {
+ rtx ne_rtx = gen_rtx_NE (mode, reg, const0_rtx);
+ cmp_cost = set_src_cost (ne_rtx, mode, speed_p);
+ rtx and_rtx = gen_rtx_AND (mode, reg, reg);
+ logic_op_cost = set_src_cost (and_rtx, mode, speed_p);
+ logic_cost = 2 * cmp_cost + logic_op_cost;
+ }
Can you check what AVR does for the above? Esp. when
mode is bigger than word_mode.
I tested the patch on AVR as requested.
With -O3, the optimization triggers, replacing the slow __mulsi3 library
call with the faster logical check sequence. With -Oz, the optimization
is correctly rejected to prioritize code size.
This indicates that the cost model query via set_src_cost is working as
intended.
I am currently reviewing the other comments and will address them in the
next patch. It might take a little time.
```c
bool foo1(int32_t a, int32_t b) { return ((int32_t)a * b) == 0; }
```
``` -O3
foo1:
push r28
push r29
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
mov r28,r22
mov r29,r23
mov r30,r24
mov r31,r25
ldi r25,lo8(1)
or r28,r29
or r28,r30
or r28,r31
breq .L2
ldi r25,0
.L2:
ldi r24,lo8(1)
or r18,r19
or r18,r20
or r18,r21
breq .L3
ldi r24,0
.L3:
or r24,r25
/* epilogue start */
pop r29
pop r28
ret
.size foo1, .-foo1
```
```-Oz
foo1:
push r28
push r29
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
mov r28,r22
mov r29,r23
mov r30,r24
mov r31,r25
mov r22,r18
mov r23,r19
mov r24,r20
mov r25,r21
mov r18,r28
mov r19,r29
mov r20,r30
mov r21,r31
rcall __mulsi3
mov r20,r22
mov r21,r23
mov r22,r24
mov r23,r25
ldi r24,lo8(1)
or r20,r21
or r20,r22
or r20,r23
breq .L2
ldi r24,0
.L2:
/* epilogue start */
pop r29
pop r28
ret
.size foo1, .-foo1
```
Thanks,
Dongyan