On Sun, Jun 22, 2025 at 2:12 PM Jan Hubicka <hubi...@ucw.cz> wrote: > > > > > Since read-modify-write is enabled for PentiumPro: > > > > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions > > such as "add $1, mem". */ > > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", > > ~(m_PENT | m_LAKEMONT)) > > > > should this > > > > /* Generate "and $0,mem" and "or $-1,mem", instead of "mov $0,mem" and > > "mov $-1,mem" with shorter encoding for TARGET_SPLIT_LONG_MOVES with > > TARGET_READ_MODIFY_WRITE or -Oz. */ > > #define TARGET_USE_AND0_ORM1_STORE \ > > ((TARGET_SPLIT_LONG_MOVES && TARGET_READ_MODIFY_WRITE) \ > > || (optimize_insn_for_size_p () && optimize_size > 1)) > > I really think we are mixing performance and code size optimizations. > I may be misremembering, but I believe that on PPro > > movl $0, (%edx) > > is slower than > > xorl %eax, %eax > movl $0, (%edx) > > due to hardware limitations on decoding instructions with long encoding. > However > > andl $0, (%edx) > > is even slower than both above since it is a read-modify-write instruction
This contradicts /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions such as "add $1, mem". */ DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", ~(m_PENT | m_LAKEMONT)) which enables "andl $0, (%edx)" for PentiumPro. "andl $0, (%edx)" works well on PentiumPro. > while both variants above does only write. I do not think hardware > special cases this. > > Situation is different when you actually do read-modify-write > > If read_modify_write is set we produce: > > andl $1, (%edx) > > While if it is unset we will do: > > movl (%edx), %eax > andl $0, %eax > movl %eax,(%edx) > > which scheduled better on original Pentium provided extra register is > available. > > Honza -- H.J.