> This contradicts > > /* X86_TUNE_READ_MODIFY_WRITE: Enable use of read modify write instructions > such as "add $1, mem". */ > DEF_TUNE (X86_TUNE_READ_MODIFY_WRITE, "read_modify_write", > ~(m_PENT | m_LAKEMONT)) > > which enables "andl $0, (%edx)" for PentiumPro. "andl $0, (%edx)" works > well on PentiumPro.
It is also enabled for zen but it does not mean that andl $0, (%edx) is a good way of clearing meomry when optimizing for speed. jan@padlo:/tmp> cat t.c int mem; int main() { for (int i = 0; i < 1000000000; i++) #ifdef AND asm volatile ("andl $0, %0":"=m"(mem)); #else #ifdef SPLIT asm volatile ("xorl %%eax, %%eax; movl $0, %0":"=m"(mem)::"eax"); #else asm volatile ("movl $0, %0":"=m"(mem)); #endif #endif return 0; } jan@padlo:/tmp> gcc -O2 t.c ; time ./a.out real 0m0.405s user 0m0.403s sys 0m0.002s jan@padlo:/tmp> gcc -O2 -DSPLIT t.c ; time ./a.out real 0m0.406s user 0m0.404s sys 0m0.001s jan@padlo:/tmp> gcc -O2 -DAND t.c ; time ./a.out real 0m2.824s user 0m2.822s sys 0m0.001s Andl is slower then movl because it inroduces unnecesary memory read. I don't have PentiumPro to test, but there -DSPLIT variant should be bit better, since instruction exceed 7 bytes. Looking into history of that knob, it was added by me https://gcc.gnu.org/pipermail/gcc-patches/1999-July/014219.html to control behaviour of splitter that split the move if it was longer then 7 bytes which was impementing the following recommendation of the Intel optimization manual: "Avoid instructions that contain four or more micro-ops or instructions that are more than seven bytes long. If possible, use instructions that require one micro-op" So the comment on SPLIT_LONG_MOVES is bit incorrect not mentining that move needs to exceed long_insn threshold. I am not sure how much we need to care about PPro perofmrance these days though. Honza