On Mon, Dec 15, 2025 at 8:30 AM Kishan Parmar <[email protected]> wrote:
>
> Hi Andrew,
>
> Thanks for the feedback.
>
> Here is a reduced testcase derived from a bitfield insertion scenario.
>
> static inline unsigned
> rot_insert (unsigned x, unsigned y, unsigned n, unsigned mb, unsigned me)
> {
>   if (n)
>     x = (x << n) | (x >> (32 - n));
>
>   unsigned s = -1;
>   if (n)
>     s = (s << n) | (s >> (32 - n));
>
>   unsigned mask = 0;
>   mask += 1U << (31 - mb);
>   mask += 1U << (31 - mb);
>   mask -= 1U << (31 - me);
>   mask -= (mb > me);
>
>   if (mask & ~s)
>     return 12345 * y;
>
>   return (x & mask) | (y & ~mask);
> }
>
> unsigned foo (unsigned x, unsigned y)
> {
>   return rot_insert (x, y, 1, 31, 31);
> }
>
> I’ve tested on Power10 and on my local Apple M3 pro, which i assume has Arm 
> v8.6-A ISA.
> If we allow constants (remove INTEGER_CST check), we regress on both.
>
> Assembly Comparison
> ————————————
>
> I we include INTEGER_CST check and don’t allow constants, we will have XOR 
> AND XOR form.. which will produce below asms.

Note you have the same issue with the following C example before this
patch (just after you get with the original form too):
```
unsigned foo0 (unsigned x, unsigned y)
{
  unsigned x1 = x >> 31;
  unsigned yy = y & ~1;
  return x1 | yy;
}
```

That was something I was trying to point out.
And why having the check for INTEGER_CST is just exchanging one issue
for another. It does not solve the new issue here.

Wait, I think we are losing/not using the SUBREG_PROMOTED_UNSIGNED_P
fact here which is causing the issues really.
We lost it when we did the original 2->7 combine (for foo0 above):
```
Trying 2 -> 7:
    2: r121:DI=r128:DI
      REG_DEAD r128:DI
      REG_EQUIV [ap:DI+0x30]
    7: r124:SI=r121:DI#4 0>>0x1f
      REG_DEAD r121:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 124 [ x1_2 ]) 0)
    (zero_extract:DI (reg:DI 128 [ x+-4 ])
        (const_int 32 [0x20])
        (const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 124 [ x1_2 ]) 0)
    (and:DI (lshiftrt:DI (reg:DI 128 [ x+-4 ])
            (const_int 31 [0x1f]))
        (const_int 4294967295 [0xffffffff])))
```

insn 7 was:
```
(insn 7 4 9 2 (set (reg:SI 124 [ x1_2 ])
        (lshiftrt:SI (subreg/s/v:SI (reg/v:DI 121 [ x+-4 ]) 4)
            (const_int 31 [0x1f]))) "/app/example.cpp":30:12 292 {lshrsi3}
     (expr_list:REG_DEAD (reg/v:DI 121 [ x+-4 ])
        (nil)))
```
insn 2 was just move.
Note /s/v on the subreg there.
If we didn't lose the SUBREG_PROMOTED_UNSIGNED_P, then things would just work.
So you need to figure out how to get

So again the INTEGER_CST is just working around an issue down the line
and not really a good idea; in the case of ppc, the losing of the
SUBREG_PROMOTED_UNSIGNED_P.

Thanks,
Andrew Pinski

>
> Power10
>
> foo:
> .LFB1:
>     .cfi_startproc
>     .localentry    foo,1
>     rlwimi 4,3,32-31,31,31
>     rldicl 3,4,0,32
>     blr
>
> Aarch64
>
> foo:
> LFB1:
>     bfxil    w1, w0, 31, 1
>     mov    w0, w1
>     ret
>
> And if we remove INTEGER_CST check and allow constants, we will have IOR AND 
> ANDN form., which produces below asm.
>
> Power10
> +2 extra instructions
> foo:
> .LFB1:
>     .cfi_startproc
>     .localentry    foo,1
>     rlwinm 4,4,0,0,30
>     srwi 3,3,31
>     or 3,3,4
>     rldicl 3,3,0,32
>     blr
>
> Aarch64:
> Does not match Bitfield extract and insert semantics.
> foo:
> LFB1:
>     and    w1, w1, -2
>     orr    w0, w1, w0, lsr 31
>     ret
>
>
> For Aarch64, i noticed combine generating below pattern failed to match,
>
> (set (reg/i:SI 0 x0)
>     (ior:SI (and:SI (reg:SI 112 [ y ])
>             (const_int -2 [0xfffffffffffffffe]))
>         (lshiftrt:SI (reg:SI 111 [ x ])
>             (const_int 31 [0x1f]))))
>
> For which we can write a pattern for bfxil which can emit “bfxil w0, w112, 
> #31, #1” maybe?
>
> But for RS6000, We hit another issue.
> We try to merge 3 instructions,
>
> i1 : (set (reg:SI 126 [ _8 ])
>         (lshiftrt:SI (subreg:SI (reg:DI 130 [ x ]) 0)
>             (const_int 31 [0x1f])))
>
> i2 : (set (reg:SI 127)
>         (and:SI (subreg:SI (reg:DI 131 [ y ]) 0)
>             (const_int -2 [0xfffffffffffffffe])))
>
> to,
> i3 : (set (reg:SI 124 [ _7 ])
>           (ior:SI (reg:SI 126 [ _8 ])
>               (reg:SI 127)))
>
> This could have matched just fine, if and only if i1 was the same it was.
> But here source of i1,
>
> (lshiftrt:SI (subreg:SI (reg:DI 130 [ x ]) 0)
>             (const_int 31 [0x1f]))
>
> gets transformed to
>
> (subreg:SI (lshiftrt:DI (reg:DI 130 [ x ])
>         (const_int 31 [0x1f])) 0).
>
> Because reason being combine performs shift in wider mode.
> And later simplify-rtx tries to simplify further and the rtx we endup having,
> (set (reg:SI 124 [ _7 ])
>     (ior:SI (and:SI (subreg:SI (reg:DI 131 [ y ]) 0)
>             (const_int -2 [0xfffffffffffffffe]))
>         (subreg:SI (zero_extract:DI (reg:DI 130 [ x ])
>                 (const_int 32 [0x20])
>                 (const_int 1 [0x1])) 0)))
>
> zero_extract, is not allowed in rs6000 backend,
> After we try to revert/replace compound zero_extract to normal basic rtx 
> operations,
> we endup having non-matchable code.
> (set (reg:SI 124 [ _7 ])
>     (ior:SI (and:SI (subreg:SI (reg:DI 131 [ y ]) 0)
>             (const_int -2 [0xfffffffffffffffe]))
>         (subreg:SI (and:DI (lshiftrt:DI (reg:DI 130 [ x ])
>                     (const_int 31 [0x1f]))
>                 (const_int 4294967295 [0xffffffff])) 0))).
>
> Because the shift logic is now wrapped in (subreg:SI (and:DI (lshiftrt:DI 
> ...))),
> the backend patterns for rlwimi (which expect clear SImode operations) no 
> longer match.
>
> >> This is the part I am not getting. Specifically the whole `+` part and
> >> which instructions are being combined.
> >>
> >> Is it that we originally had:
> >> A = RRotate(P, N)
> >> T = (A & C)
> >> T1 = (B & ~C)
> >> R = T | T1
> >>
> >> And then we get:
> >> T = lshiftrt (P, N)
> >> T1 = B & ~C
> >> R = T | T1
> >>
> >> Which then is not recognized as inserting P into B starting at N for
> >> the bitmask ?
> >> But can't that show up even without the andn pattern?
> >> Which means there might be a missing rs6000 pattern for this case?
> >>
> >> Thanks,
> >> Andrew
> We do have a pattern for that,
>  4359 (define_insn "*rotlsi3_insert_4"
>  4360   [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
>  4361         (ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0")
>  4362                         (match_operand:SI 4 "const_int_operand" "n"))
>  4363                 (lshiftrt:SI (match_operand:SI 1 "gpc_reg_operand" "r")
>  4364                              (match_operand:SI 2 "const_int_operand" 
> "n"))))]
>  4365   "INTVAL (operands[2]) + exact_log2 (-UINTVAL (operands[4])) == 32"
>  4366   "rlwimi %0,%1,32-%h2,%h2,31"
>  4367   [(set_attr "type" "insert")])
>
> If we have a form XOR-AND-XOR, which gets transformed to above pattern just 
> fine via simplify-rtx.cc.
> 4052       /* If we have (xor (and (xor A B) C) A) with C a constant we can 
> instead
> 4053          do (ior (and A ~C) (and B C)) which is a machine instruction on 
> some
> 4054          machines, and also has shorter instruction path length.  */
>
>
> Since simplify-rtx handles the XOR-form correctly, restricting this GIMPLE 
> patch to variables seemed the safest path to avoid these specific Combine 
> side-effects.
>
> I have attached the combine dumps for reference. I will also work on adding 
> the GIMPLE testcases and target-supports entries as you requested.
>
> Thanks,
> Kishan
>
>

Reply via email to