Karl Meakin <karl.mea...@arm.com> writes:
> On 07/05/2025 14:32, Richard Sandiford wrote:
>> Karl Meakin <karl.mea...@arm.com> writes:
>>> Add rules for lowering `cbranch<mode>4` to CBB/CBH/CB when CMPBR
>>> extension is enabled.
>>>
>>> gcc/ChangeLog:
>>>
>>>     * config/aarch64/aarch64.md (cbranch<mode>4): emit CMPBR
>>>     instructions if possible.
>>>     (cbranch<SHORT:mode>4): new expand rule.
>>>     (aarch64_cb<GPI:mode>): likewise.
>>>     (aarch64_cb<SHORT:mode>): likewise.
>>>     * config/aarch64/iterators.md (cmpbr_suffix): new mode attr.
>>>     * config/aarch64/predicates.md (const_0_to_63_operand): new
>>>     predicate.
>>>     (aarch64_cb_immediate): likewise.
>>>     (aarch64_cb_operand): likewise.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>>     * gcc.target/aarch64/cmpbr.c: update tests.
>> In addition to Kyrill's comments (which I agree with):
>>
>>> @@ -720,18 +720,41 @@ (define_constants
>>>   ;; Conditional jumps
>>>   ;; -------------------------------------------------------------------
>>>   
>>> -(define_expand "cbranch<mode>4"
>>> +(define_expand "cbranch<GPI:mode>4"
>>>     [(set (pc) (if_then_else (match_operator 0 "aarch64_comparison_operator"
>>>                         [(match_operand:GPI 1 "register_operand")
>>>                          (match_operand:GPI 2 "aarch64_plus_operand")])
>>>                        (label_ref (match_operand 3))
>>>                        (pc)))]
>>>     ""
>>> -  "
>>> -  operands[1] = aarch64_gen_compare_reg (GET_CODE (operands[0]), 
>>> operands[1],
>>> -                                    operands[2]);
>>> -  operands[2] = const0_rtx;
>>> -  "
>>> +  {
>>> +  if (TARGET_CMPBR && aarch64_cb_operand (operands[2], <MODE>mode))
>>> +    {
>>> +      emit_jump_insn (gen_aarch64_cb<mode> (operands[0], operands[1],
>>> +                                       operands[2], operands[3]));
>>> +      DONE;
>>> +    }
>> There is an implicit choice here to use a separate CMP + Bcc if the
>> immediate is out of range, rather than force out-of-range immediates into
>> a temporary register.  That can be the right choice for immediates in the
>> range of CMP, but whether it is or not depends on global information that
>> we don't have.  If the immediate is needed for multiple branches, it would
>> be better (sizewise) to load the immediate into a temporary register and
>> use it for each branch, provided that there's a call-clobbered register
>> free and that the branches are in the 1KiB range.  In other situations,
>> what the patch does is best.
>
> Do you mean replacing code like
> ```
> cmp x1, 100
> beq .L1
> cmp x2 100
> beq .L2
> ```
>
> with
> ```
> mov x0 100
> cbeq x0, x1, .L1
> cbeq x0, x2, .L2
> ```
>
> That would be preferable, but as you say we would need to know whether 
> .L1 and .L2 are in range, and if x0 is free.
> I don't think that is something we can easily determine in the middle of 
> RTL generation.

Right, exactly.

>> But perhaps it would be worth forcing values that are outside the
>> range of CMP into a register and using the new form, rather than
>> emitting an immediate move, a CMP, and a branch.
>
> That already happens thanks to the ordering of the rules (the rule for 
> MOV+CB<cond> comes before the rule for MOV+CMP+B<cond>).

Ah, right, I missed that the predicate on operand 2 was still
restricted to the CMP range.

> I will add some tests to record this behaviour.

Thanks.  Like I say, a comment acknowledging/explaining the choice would
be good too.

Richard

>> Either way, I think it's worth a comment saying what we do with
>> out-of-range immediates.

Reply via email to