On Sun, Jun 22, 2025 at 1:32 PM Jan Hubicka <hubi...@ucw.cz> wrote:
>
> > Since there is
> >
> > /* X86_TUNE_SPLIT_LONG_MOVES: Avoid instructions moving immediates
> >    directly to memory.  */
> > DEF_TUNE (X86_TUNE_SPLIT_LONG_MOVES, "split_long_moves", m_PPRO)
>
> If I recall correctly, this tune was added for PentiumPro which had
> problem decoding moves with long immediate and is a performance
> optimization rather than code size one.
>
> As you discuss in the PR, i686 preffers
>         xorl    %eax, %eax
>         movl    %eax, aligned_heap_area
> over
>         movl $0, aligned_heap_area
> which has too long encoding.
> >
> > to avoid long immediate store instructions, like
> >
> > c7 02 00 00 00 00    movl   $0x0,(%rdx)
> > c7 02 ff ff ff ff    movl   $0xffffffff,(%rdx)
> >
> > add TARGET_USE_AND0_ORM1_STORE and enable *mov<mode>_(and|or) for
> > TARGET_USE_AND0_ORM1_STORE, which is true for TARGET_SPLIT_LONG_MOVES or
> > -Oz, to also generate:
> >
> > 83 22 00              andl   $0x0,(%rdx)
> > 83 0a ff              orl    $0xffffffff,(%rdx)
>
> I think this will not work well on PPro hardware since it will do it as
> read-modify-write, while

Since read-modify-write is enabled for PentiumPro:

/* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions
   such as "add $1, mem".  */
DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write",
          ~(m_PENT | m_LAKEMONT))

should this

/* Generate "and $0,mem" and "or $-1,mem", instead of "mov $0,mem" and
   "mov $-1,mem" with shorter encoding for TARGET_SPLIT_LONG_MOVES with
   TARGET_READ_MODIFY_WRITE or -Oz.  */
#define TARGET_USE_AND0_ORM1_STORE \
  ((TARGET_SPLIT_LONG_MOVES && TARGET_READ_MODIFY_WRITE) \
   || (optimize_insn_for_size_p () && optimize_size > 1))

work?

>         xorl %eax, %eax
>         movl %eax, (%rdx)
> is executed as store saving the useless load.
>
> Honza



-- 
H.J.

Reply via email to