On Mon, Dec 8, 2025 at 12:06 AM Jeff Law <[email protected]> wrote:
>
>
>
> On 12/7/25 1:44 PM, Florian Weimer wrote:
> > * Jeff Law:
> >
> >> This is Shreya's work except for the SH testcase which I added after
> >> realizing her work would also fix the testcases for that port.  I
> >> bootstrapped and regression tested this on sh4-linux-gnu, x86_64 &
> >> risc-v.  It also was tested across all the embedded targets in my tester
> >> without regressions.
> >>
> >> --
> >>
> >>
> >> We are extracting two single-bit bitfields from a structure and
> >> determining whether they both have the value 0 or if at least one bit is
> >> set. This has been generating poor code:
> >>
> >>   >         lw      a5,0(a0)
> >>   >         bexti   a0,a5,1
> >>   >         bexti   a5,a5,2
> >>   >         or      a0,a0,a5
> >>   >         ret
> >>
> >> We address this as a simplification problem and optimize this using an
> >> andi of the original value and a mask with just the desired bits set,
> >> followed by a snez. This results in a 1 if any of those bits are set or
> >>    0 if none.
> >>
> >> For cases where we want to extract three or more single-bit bitfields,
> >> we build on the previous case. We take the result of the 2-bitfield
> >> case, extract the mask, update it to include the new single-bit
> >> bitfield, and again perform an andi + snez.
> >>
> >> In our new testfile, we scan to ensure we do not see a bexti or an or
> >> instruction, and that we have the correct assembly for both two and
> >> three single-bit bitfield cases: lw + andi + snez + ret.
> >
> > We still have horrible code generation for this on x86-64.
> >
> >  From the bug report:
> >
> [ ... ]
> I know.  That's part of why the bug is staying open.
>
>
>
> >
> > Is there no generic infrastructure that could handle this?
> That was the hope of doing it in simplify-rtx since that's a common low
> level simplifier module.  But's not necessarily sufficient.
>
> match.pd isn't great for this as we'd need to see a single load covering
> the two fields within the structure, that's not trivially exposed until RTL.
>
> And the actual formation in simplify-rtx can look different on different
> targets and needs to match something the x86 port defines.  It looks
> like x86 needs a pattern like this;
>
> > Failed to match this instruction:
> > (parallel [
> >         (set (reg:QI 102 [ _5 ])
> >             (ne:QI (and:QI (reg:QI 106 [ *s_4(D) ])
> >                     (const_int 6 [0x6]))
> >                 (const_int 0 [0])))
> >         (clobber (reg:CC 17 flags))
> >     ])

This would work, but the pattern would also combine:

--cut here--
char foo (char a)
{
 return (a & 6) != 0;
}
--cut here--

Trying 7 -> 8:
    7: flags:CCZ=cmp(r105:QI&0x6,0)
      REG_DEAD r105:QI
    8: r104:QI=flags:CCZ!=0
      REG_DEAD flags:CCZ
Failed to match this instruction:
(set (reg:QI 104 [ _1 ])
    (ne:QI (and:QI (reg:QI 105 [ a ])
            (const_int 6 [0x6]))
        (const_int 0 [0])))

which we don't want, because we'd have to split it back to CC setting
and CC using insn, like:

(define_insn_and_split "*test_setne_<mode>"
  [(set (match_operand:SWI 0 "nonimmediate_operand")
        (ne:SWI (and:SWI (match_operand:SWI 1 "nonimmediate_operand")
                        (match_operand:SWI 2 "<general_szext_operand>"))
            (const_int 0)))
   (clobber (reg:CC FLAGS_REG))]
  "!(MEM_P (operands[1]) && MEM_P (operands[2]))
   && ix86_pre_reload_split ()"
  "#"
  "&& 1"
  [(set (reg:CCZ FLAGS_REG)
        (compare:CCZ
      (and:SWI (match_dup 1) (match_dup 2))
    (const_int 0)))
   (set (match_dup 0)
       (ne:SWI (reg:CCZ FLAGS_REG) (const_int 0)))])

Uros.

Reply via email to