Karl Meakin <karl.mea...@arm.com> writes: > On 07/05/2025 14:32, Richard Sandiford wrote: >> Karl Meakin <karl.mea...@arm.com> writes: >>> Add rules for lowering `cbranch<mode>4` to CBB/CBH/CB when CMPBR >>> extension is enabled. >>> >>> gcc/ChangeLog: >>> >>> * config/aarch64/aarch64.md (cbranch<mode>4): emit CMPBR >>> instructions if possible. >>> (cbranch<SHORT:mode>4): new expand rule. >>> (aarch64_cb<GPI:mode>): likewise. >>> (aarch64_cb<SHORT:mode>): likewise. >>> * config/aarch64/iterators.md (cmpbr_suffix): new mode attr. >>> * config/aarch64/predicates.md (const_0_to_63_operand): new >>> predicate. >>> (aarch64_cb_immediate): likewise. >>> (aarch64_cb_operand): likewise. >>> >>> gcc/testsuite/ChangeLog: >>> >>> * gcc.target/aarch64/cmpbr.c: update tests. >> In addition to Kyrill's comments (which I agree with): >> >>> @@ -720,18 +720,41 @@ (define_constants >>> ;; Conditional jumps >>> ;; ------------------------------------------------------------------- >>> >>> -(define_expand "cbranch<mode>4" >>> +(define_expand "cbranch<GPI:mode>4" >>> [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator" >>> [(match_operand:GPI 1 "register_operand") >>> (match_operand:GPI 2 "aarch64_plus_operand")]) >>> (label_ref (match_operand 3)) >>> (pc)))] >>> "" >>> - " >>> - operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), >>> operands[1], >>> - operands[2]); >>> - operands[2] = const0_rtx; >>> - " >>> + { >>> + if (TARGET_CMPBR && aarch64_cb_operand (operands[2], <MODE>mode)) >>> + { >>> + emit_jump_insn (gen_aarch64_cb<mode> (operands[0], operands[1], >>> + operands[2], operands[3])); >>> + DONE; >>> + } >> There is an implicit choice here to use a separate CMP + Bcc if the >> immediate is out of range, rather than force out-of-range immediates into >> a temporary register. That can be the right choice for immediates in the >> range of CMP, but whether it is or not depends on global information that >> we don't have. If the immediate is needed for multiple branches, it would >> be better (sizewise) to load the immediate into a temporary register and >> use it for each branch, provided that there's a call-clobbered register >> free and that the branches are in the 1KiB range. In other situations, >> what the patch does is best. > > Do you mean replacing code like > ``` > cmp x1, 100 > beq .L1 > cmp x2 100 > beq .L2 > ``` > > with > ``` > mov x0 100 > cbeq x0, x1, .L1 > cbeq x0, x2, .L2 > ``` > > That would be preferable, but as you say we would need to know whether > .L1 and .L2 are in range, and if x0 is free. > I don't think that is something we can easily determine in the middle of > RTL generation.
Right, exactly. >> But perhaps it would be worth forcing values that are outside the >> range of CMP into a register and using the new form, rather than >> emitting an immediate move, a CMP, and a branch. > > That already happens thanks to the ordering of the rules (the rule for > MOV+CB<cond> comes before the rule for MOV+CMP+B<cond>). Ah, right, I missed that the predicate on operand 2 was still restricted to the CMP range. > I will add some tests to record this behaviour. Thanks. Like I say, a comment acknowledging/explaining the choice would be good too. Richard >> Either way, I think it's worth a comment saying what we do with >> out-of-range immediates.