https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81461
Bug ID: 81461
Summary: Optimization for removing same variable comparisons in
loop: while(it != end1 && it != end2)
Product: gcc
Version: 7.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: antoshkka at gmail dot com
Target Milestone: ---
Simple iteration by std::deque elements produces suboptimal code. For example
#include <deque>
unsigned sum(std::deque<unsigned> cont) {
unsigned sum = 0;
for (unsigned v : cont)
sum += v;
return sum;
}
produces the following loop:
.L2:
cmp rcx, rdx
je .L1
add eax, DWORD PTR [rdx]
add rdx, 4
cmp rdx, rsi
jne .L2
mov rdx, QWORD PTR [r8+8]
add r8, 8
lea rsi, [rdx+512]
jmp .L2
The loop has two comparisons in it and behaves as the following C code:
unsigned sum_like0(unsigned** chunks, unsigned* end) {
unsigned sum = 0;
for (unsigned* it = *chunks; it != end; it = *(++chunks)) {
for (;it != end && it != *chunks + 128; ++it) {
sum += *it;
}
}
return sum;
}
Note the `it != end && it != *chunks + 128` condition. It could be simplified:
if `end` belongs to `[it, *chunks + 128]` change the condition to `it != end`
and use the condition `it != *chunks + 128` otherwise. Such optimization
removes the cmp from the loop and produces a much more faster loop:
.L15:
add eax, DWORD PTR [rdx]
add rdx, 4
cmp rdx, rcx
jne .L15
Synthetic tests show up to 2 times better performance. Assembly outputs:
https://godbolt.org/g/L7Mr4M