On Wed, Jun 17, 2026 at 10:21 AM Ciprian Arbone <[email protected]> wrote: > > On Tue, 16 Jun 2026 at 16:48, Richard Biener <[email protected]> > wrote: > > > > On Fri, Jun 12, 2026 at 3:31 PM Ciprian Arbone via Gcc <[email protected]> > > wrote: > > > > > > Hello, > > > > > > We recently enabled LDMIA/STMIA instructions for Thumb-1 (Cortex-M0+) by > > > modifying ARM_AUTOINC_VALID_FOR_MODE_P to allow auto-increment addressing > > > for THUMB1 targets. However, we've discovered that IVOPTs generates > > > suboptimal code for simple loops due to incorrect addressing mode > > > selection. > > > > > > Consider this test case: > > > > > > void test(int *a, int *b, int size) > > > { > > > for (int i = 0; i < size; i++) > > > { > > > a[i] = b[i] * a[i]; > > > } > > > } > > > > > > GCC currently generates: > > > > > > ldmia r0!, {r4} > > > ldmia r1!, {r6} > > > subs r5, r0, #4 > > > ... > > > str r4, [r5, #0] > > > > > > The issue occurs because IVOPTs selects a candidate with the lowest cost > > > that > > > has the following structure: > > > > > > Candidate xxx: > > > Incr POS: after use 0 > > > IV struct: > > > Type: unsigned int > > > Base: (unsigned int) a_13(D) > > > Step: 4 > > > > > > This results in the following loop structure: > > > > > > loop-preheader: > > > r0 = a > > > jump loop-exiting > > > > > > loop-header: > > > load-from [r0] > > > increment r0 > > > store-to [r0, #-4] > > > > > > loop-exiting: > > > jump loop-header > > > > > > **Issue 1:** IVOPTs recognizes both patterns as valid post-increment with > > > offset zero: > > > - "load-from [r0]; increment r0" → recognized as post-inc from offset 0 > > > - "increment r0; store-to [r0, #-4]" → also recognized as post-inc from > > > offset 0 > > > > > > The code in tree-ssa-loop-ivopts.cc:get_address_cost() applies the > > > adjustment: > > > > > > if (stmt_after_increment (data->current_loop, cand, use->stmt)) > > > ainc_offset += ainc_step; > > > cost = get_address_cost_ainc (ainc_step, ainc_offset, > > > addr_mode, mem_mode, as, speed); > > > > > > However, Thumb-1 does not support negative immediate offsets in addressing > > > modes. The pattern "increment r0; store-to [r0, #-4]" can never be > > > realized > > > as a post-increment store on Thumb-1, yet IVOPTs assigns it a low cost. > > > > > > **Question 1:** Should get_address_cost() verify that an addressing mode > > > is > > > actually valid on the target before assigning auto-increment cost? > > > Currently, > > > it appears to assume validity without checking target constraints. > > > > It should end up calling the legitimize_address_p target hook to verify > > validity. > > The legitimize_address_p target hook is invoked, but relatively > late—during the GIMPLE-to-RTL conversion. > At that point, after ivopts, the GIMPLE already contains code that may > not be optimal.
IVOPTs checks valid_mem_ref_p which calls memory_address_addr_space_p with artificially generated RTL and that calls legitimate_address_p. Richard. > > _24 = (void *) ivtmp.15_20; > _6 = MEM[(int *)_24]; > ivtmp.15_21 = ivtmp.15_20 + 4; > > . . . > > _25 = (void *) ivtmp.15_21; > MEM[(int *)_25 + 4294967292 * 1] = _7; <------ > > Would it make sense to check whether the addressing mode is valid at this > point > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4732), > similar to the validation performed later > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4753)? > > > > > > **Issue 2:** IVOPTs also assigns low cost to another candidate: > > > > > > Candidate yyy: > > > Incr POS: before exit test > > > IV struct: > > > Type: unsigned int > > > Base: (unsigned int) a_13(D) > > > Step: 4 > > > > > > This produces: > > > > > > loop-preheader: > > > r0 = &a[0] > > > jump loop-exiting > > > > > > loop-header: > > > load-from [r0, #-4] > > > store-to [r0] > > > > > > loop-exiting: > > > increment r0 > > > jump loop-header > > > > > > IVOPTs considers that the increment in the loop-exiting block can be > > > paired > > > with "load-from [r0, #-4]" in the loop-header block, despite them being in > > > different basic blocks. > > > > > > **Question 2:** Should get_address_cost() verify that the candidate > > > increment > > > and use->stmt are in the same basic block when cand->pos == IP_NORMAL? > > > Cross-block pairing seems problematic for post-increment addressing mode > > > costing. > > > > If the pairing is wrong it should be fixed, where's that pairing done? > > > > The pairing takes place at this point > (https://github.com/gcc-mirror/gcc/blob/master/gcc/tree-ssa-loop-ivopts.cc#L4739), > where the candidate (the `increment r0` statement from the > loop‑exiting basic block) is combined with the use > (the `load-from [r0, #-4]` statement from the loop‑header basic > block), resulting in can_autoinc being set to true. > > loop-header: > load-from [r0, #-4] > store-to [r0] > > loop-exiting: > increment r0 > bcc loop-header > > > > > > > Both issues suggest that IVOPTs may need additional validation to ensure: > > > 1. The selected addressing mode is actually supported by the target > > > 2. The increment and memory operation are properly co-located for > > > IP_NORMAL > > > candidates > > > > > > Best regards, > > > Ciprian Arbone
