On Wed, Feb 19, 2025 at 2:10 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > On Wed, Feb 19, 2025 at 8:16 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > On Wed, Feb 19, 2025 at 12:53 PM H.J. Lu <hjl.to...@gmail.com> wrote: > > > > > > Since stack can only be accessed by GPR, check GENERAL_REG_P, instead of > > > REG_P, in ix86_find_all_reg_use_1. > > > > > > gcc/ > > > > > > PR target/118936 > > > * config/i386/i386.cc (ix86_find_all_reg_use_1): Replace REG_P > > > with GENERAL_REG_P. > > > > > > gcc/testsuite/ > > > > > > PR target/118936 > > > * gcc.target/i386/pr118936.c: New test. > > > > > > OK for master? > > > > As said in the PR, this change "fixes" only this particular problem, > > where we have: > > > > 97: ax:DI='g_1680' > > 170: xmm0:DI=ax:DI > > 98: xmm0:V2DI=vec_duplicate(xmm0:DI) > > REG_EQUIV vec_duplicate(`g_1680') > > 101: [`g_1679']=xmm0:V2DI > > 103: [const(`g_1679'+0x10)]=xmm0:V2DI > > 105: [const(`g_1679'+0x20)]=xmm0:V2DI > > 107: [const(`g_1679'+0x30)]=xmm0:V2DI > > > > But does not fix the fact that your algorithm considers these stores > > even when they are in no way connected to stack or frame pointer. > > These do not even use registers in their address. > > > > Previous approach did this: > > - if (check_stack_slot) > > - { > > - /* Find the maximum stack alignment. */ > > - subrtx_iterator::array_type array; > > - FOR_EACH_SUBRTX (iter, array, PATTERN (insn), ALL) > > - if (MEM_P (*iter) > > - && (reg_mentioned_p (stack_pointer_rtx, > > - *iter) > > - || reg_mentioned_p (frame_pointer_rtx, > > - *iter))) > > - { > > - unsigned int alignment = MEM_ALIGN (*iter); > > - if (alignment > stack_alignment) > > - stack_alignment = alignment; > > - } > > - } > > > > So, anywhere in the insn SP or FP was mentioned (either memory store > > or load), the alignment was increased as requested by MEM. The issue > > here is that another register that is calculated from SP or FP can be > > used to access the stack slot. I'm not sure I know how your algorithm > > works id detail, it detects e.g.: > > > > 103: [const(`g_1679'+0x10)]=xmm0:V2DI > > > > as a candidate to increase the alignment, but its address doesn't even > > use registers, let alone SP or FP. Also, > > > > + note_stores (insn, ix86_update_stack_alignment, > > + &stack_alignment); > > > > in your algorithm does not consider that loads from the stack also > > increase alignment requirements. > > > > Can you please explain the approach in your original patch in some > > more detail? I'd expect a variant of the original approach, where the > > compiler keeps an up-to-date list of registers that depend on SP or > > FP, and instead of a naive: > > > > reg_mentioned_p (stack_pointer_rtx, *iter) > > > > it goes through a list, formed of SP, FP and their dependent registers > > and updates stack alignment if the insn memory address uses the > > register on the list. > > > > Uros. > > My algorithm keeps a list of registers which can access the stack > starting with SP and FP. If any registers are derived from the list, > add them to the list until all used registers are exhausted. If > any stores use the register on the list, update the stack alignment. > The missing part is that it doesn't check if the register is actually > used for memory access. > > Here is the v2 patch to also check if the register is used for memory > access.
@@ -8473,16 +8482,15 @@ static void ix86_update_stack_alignment (rtx, const_rtx pat, void *data) { /* This insn may reference stack slot. Update the maximum stack slot - alignment. */ + alignment if the memory is reference by the specified register. */ referenced + stack_access_data *p = (stack_access_data *) data; subrtx_iterator::array_type array; FOR_EACH_SUBRTX (iter, array, pat, ALL) - if (MEM_P (*iter)) + if (MEM_P (*iter) && reg_mentioned_p (p->reg, *iter)) @@ -8494,7 +8502,7 @@ ix86_find_all_reg_use_1 (rtx set, HARD_REG_SET &stack_slot_access, auto_bitmap &worklist) { rtx dest = SET_DEST (set); - if (!REG_P (dest)) + if (!GENERAL_REG_P (dest)) return; The above change is not needed now that the register in the memory reference is checked. The compiler can generate GPR->XMM->GPR sequence. @@ -8630,8 +8638,8 @@ ix86_find_max_used_stack_alignment (unsigned int &stack_alignment, if (!NONJUMP_INSN_P (insn)) continue; - note_stores (insn, ix86_update_stack_alignment, - &stack_alignment); + stack_access_data data = { DF_REF_REG (ref), &stack_alignment }; + note_stores (insn, ix86_update_stack_alignment, &data); Do we need FOR_EACH_SUBRTX here instead of note_stores to also process reads from stack? Reads should also be aligned. Uros.