Hello,

We recently enabled LDMIA/STMIA instructions for Thumb-1 (Cortex-M0+) by
modifying ARM_AUTOINC_VALID_FOR_MODE_P to allow auto-increment addressing
for THUMB1 targets. However, we've discovered that IVOPTs generates
suboptimal code for simple loops due to incorrect addressing mode selection.

Consider this test case:

void test(int *a, int *b, int size)
{
    for (int i = 0; i < size; i++)
    {
        a[i] = b[i] * a[i];
    }
}

GCC currently generates:

    ldmia   r0!, {r4}
    ldmia   r1!, {r6}
    subs    r5, r0, #4
    ...
    str     r4, [r5, #0]

The issue occurs because IVOPTs selects a candidate with the lowest cost that
has the following structure:

Candidate xxx:
  Incr POS: after use 0
  IV struct:
    Type:       unsigned int
    Base:       (unsigned int) a_13(D)
    Step:       4

This results in the following loop structure:

loop-preheader:
    r0 = a
    jump loop-exiting

loop-header:
    load-from  [r0]
    increment  r0
    store-to   [r0, #-4]

loop-exiting:
    jump loop-header

**Issue 1:** IVOPTs recognizes both patterns as valid post-increment with
offset zero:
  - "load-from [r0]; increment r0" → recognized as post-inc from offset 0
  - "increment r0; store-to [r0, #-4]" → also recognized as post-inc from
    offset 0

The code in tree-ssa-loop-ivopts.cc:get_address_cost() applies the adjustment:

    if (stmt_after_increment (data->current_loop, cand, use->stmt))
        ainc_offset += ainc_step;
    cost = get_address_cost_ainc (ainc_step, ainc_offset,
                                  addr_mode, mem_mode, as, speed);

However, Thumb-1 does not support negative immediate offsets in addressing
modes. The pattern "increment r0; store-to [r0, #-4]" can never be realized
as a post-increment store on Thumb-1, yet IVOPTs assigns it a low cost.

**Question 1:** Should get_address_cost() verify that an addressing mode is
actually valid on the target before assigning auto-increment cost? Currently,
it appears to assume validity without checking target constraints.

**Issue 2:** IVOPTs also assigns low cost to another candidate:

Candidate yyy:
  Incr POS: before exit test
  IV struct:
    Type:       unsigned int
    Base:       (unsigned int) a_13(D)
    Step:       4

This produces:

loop-preheader:
    r0 = &a[0]
    jump loop-exiting

loop-header:
    load-from  [r0, #-4]
    store-to   [r0]

loop-exiting:
    increment  r0
    jump loop-header

IVOPTs considers that the increment in the loop-exiting block can be paired
with "load-from [r0, #-4]" in the loop-header block, despite them being in
different basic blocks.

**Question 2:** Should get_address_cost() verify that the candidate increment
and use->stmt are in the same basic block when cand->pos == IP_NORMAL?
Cross-block pairing seems problematic for post-increment addressing mode
costing.

Both issues suggest that IVOPTs may need additional validation to ensure:
1. The selected addressing mode is actually supported by the target
2. The increment and memory operation are properly co-located for IP_NORMAL
   candidates

Best regards,
Ciprian Arbone

Reply via email to