Hi all,

I'm trying, without success, to disable loop unrolling when compiling a program with -O3 with gcc (4.4, but I see the same problem with 4.3).

The program is the following one:
volatile int v;

void func()
{
    int i;

    for( i=0; i<8; ++i ) {
    v=0;
    }
}
I compile it with the following command line:

gcc -c -O3 test.c

An "objdump -S test.o" gives:
test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <func>:
0: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # a <func+0xa>
   7:   00 00 00
a: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 14 <func+0x14>
  11:   00 00 00
14: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 1e <func+0x1e>
  1b:   00 00 00
1e: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 28 <func+0x28>
  25:   00 00 00
28: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 32 <func+0x32>
  2f:   00 00 00
32: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 3c <func+0x3c>
  39:   00 00 00
3c: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 46 <func+0x46>
  43:   00 00 00
46: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 50 <func+0x50>
  4d:   00 00 00
  50:   c3                      retq
If I compile with -O2, the results are:
test.o:     file format elf64-x86-64

Disassembly of section .text:

0000000000000000 <func>:
   0:   31 c0                   xor    %eax,%eax
   2:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)
   8:   83 c0 01                add    $0x1,%eax
b: c7 05 00 00 00 00 00 movl $0x0,0x0(%rip) # 15 <func+0x15>
  12:   00 00 00
  15:   83 f8 08                cmp    $0x8,%eax
  18:   75 ee                   jne    8 <func+0x8>
  1a:   f3 c3                   repz retq
Where it gets worrying is when I try to cancel loop unrolling. I tried "-fno-unroll-loops" and "-fno-peel-loops", to no effect. I even tried messing with the --param option (max-unrolled-insns, max-unroll-times, max-peel-times) to no noticeable effect.

Even more worryingly, the documentation seems totally wrong. It claims (http://gcc.gnu.org/onlinedocs/gcc-4.4.2/gcc/Optimize-Options.html#index-O3-632) that -O3 is equal to -O2 plus -finline-functions, -funswitch-loops, -fpredictive-commoning, -fgcse-after-reload and -ftree-vectorize. Trying to compile with -O2 and the additional optimization options does not, however, unroll the loop, which suggests that -O3 differs from -O2 in another way as well.

Help?

Shachar

--
Shachar Shemesh
Lingnu Open Source Consulting Ltd.
http://www.lingnu.com

_______________________________________________
Linux-il mailing list
Linux-il@cs.huji.ac.il
http://mailman.cs.huji.ac.il/mailman/listinfo/linux-il

Reply via email to