https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86680
Bug ID: 86680 Summary: possible gcc optimization Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: florian.laroche at googlemail dot com Target Milestone: --- Created attachment 44444 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=44444&action=edit testcase I can see this on x86_64 and aarch64. The first function is compiled with much bigger code. Seems the alignment to 8 bytes and thus this multiple of 8 is forgotten in some optimization step. best regards, Florian La Roche $ aarch64-linux-gnu-gcc-8 -O2 -c test.c $ aarch64-linux-gnu-objdump -d test.o test.o: Dateiformat elf64-littleaarch64 Disassembly of section .text: 0000000000000000 <clear_bss1>: 0: 90000001 adrp x1, 0 <__bss_start1> 4: 90000000 adrp x0, 0 <__bss_end1> 8: f9400022 ldr x2, [x1] c: f9400000 ldr x0, [x0] 10: eb00005f cmp x2, x0 14: 54000142 b.cs 3c <clear_bss1+0x3c> // b.hs, b.nlast 18: d1000401 sub x1, x0, #0x1 1c: aa0203e0 mov x0, x2 20: cb020021 sub x1, x1, x2 24: 927df021 and x1, x1, #0xfffffffffffffff8 28: 91002021 add x1, x1, #0x8 2c: 8b020021 add x1, x1, x2 30: f800841f str xzr, [x0], #8 34: eb01001f cmp x0, x1 38: 54ffffc1 b.ne 30 <clear_bss1+0x30> // b.any 3c: d65f03c0 ret 0000000000000040 <clear_bss2>: 40: 90000000 adrp x0, 0 <__bss_start2> 44: 90000001 adrp x1, 0 <__bss_end2> 48: f9400000 ldr x0, [x0] 4c: f9400021 ldr x1, [x1] 50: f9400000 ldr x0, [x0] 54: f9400021 ldr x1, [x1] 58: eb01001f cmp x0, x1 5c: 54000082 b.cs 6c <clear_bss2+0x2c> // b.hs, b.nlast 60: f800841f str xzr, [x0], #8 64: eb01001f cmp x0, x1 68: 54ffffc3 b.cc 60 <clear_bss2+0x20> // b.lo, b.ul, b.last 6c: d65f03c0 ret Please note how the second function is compiled much smaller. The first function from "18" to "2c" should basically be optimized away. Compiling with -Os is also much better: $ aarch64-linux-gnu-gcc-8 -Os -c test.c $ aarch64-linux-gnu-objdump -d test.o test.o: Dateiformat elf64-littleaarch64 Disassembly of section .text: 0000000000000000 <clear_bss1>: 0: 90000000 adrp x0, 0 <__bss_start1> 4: 90000001 adrp x1, 0 <__bss_end1> 8: f9400000 ldr x0, [x0] c: f9400021 ldr x1, [x1] 10: eb01001f cmp x0, x1 14: 54000043 b.cc 1c <clear_bss1+0x1c> // b.lo, b.ul, b.last 18: d65f03c0 ret 1c: f800841f str xzr, [x0], #8 20: 17fffffc b 10 <clear_bss1+0x10> 0000000000000024 <clear_bss2>: 24: 90000000 adrp x0, 0 <__bss_start2> 28: 90000001 adrp x1, 0 <__bss_end2> 2c: f9400000 ldr x0, [x0] 30: f9400021 ldr x1, [x1] 34: f9400000 ldr x0, [x0] 38: f9400021 ldr x1, [x1] 3c: eb00003f cmp x1, x0 40: 54000048 b.hi 48 <clear_bss2+0x24> // b.pmore 44: d65f03c0 ret 48: f800841f str xzr, [x0], #8 4c: 17fffffc b 3c <clear_bss2+0x18> The problem also shows up on x86_64 from "13" to "22": $ gcc -O2 -c test.c $ objdump -d test.o test.o: Dateiformat elf64-x86-64 Disassembly of section .text: 0000000000000000 <clear_bss1>: 0: 48 8d 05 00 00 00 00 lea 0x0(%rip),%rax # 7 <clear_bss1+0x7> 7: 48 8d 15 00 00 00 00 lea 0x0(%rip),%rdx # e <clear_bss1+0xe> e: 48 39 d0 cmp %rdx,%rax 11: 73 25 jae 38 <clear_bss1+0x38> 13: 48 8d 48 08 lea 0x8(%rax),%rcx 17: 48 83 c2 07 add $0x7,%rdx 1b: 48 29 ca sub %rcx,%rdx 1e: 48 83 e2 f8 and $0xfffffffffffffff8,%rdx 22: 48 01 ca add %rcx,%rdx 25: 0f 1f 00 nopl (%rax) 28: 48 c7 00 00 00 00 00 movq $0x0,(%rax) 2f: 48 83 c0 08 add $0x8,%rax 33: 48 39 d0 cmp %rdx,%rax 36: 75 f0 jne 28 <clear_bss1+0x28> 38: f3 c3 repz retq 3a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 0000000000000040 <clear_bss2>: 40: 48 8b 05 00 00 00 00 mov 0x0(%rip),%rax # 47 <clear_bss2+0x7> 47: 48 8b 15 00 00 00 00 mov 0x0(%rip),%rdx # 4e <clear_bss2+0xe> 4e: 48 39 d0 cmp %rdx,%rax 51: 73 16 jae 69 <clear_bss2+0x29> 53: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 58: 48 83 c0 08 add $0x8,%rax 5c: 48 c7 40 f8 00 00 00 movq $0x0,-0x8(%rax) 63: 00 64: 48 39 d0 cmp %rdx,%rax 67: 72 ef jb 58 <clear_bss2+0x18> 69: f3 c3 repz retq