https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122101
Bug ID: 122101
Summary: Large performance difference between GCC and Clang
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: c++
Assignee: unassigned at gcc dot gnu.org
Reporter: arash at partow dot net
Target Milestone: ---
Created attachment 62475
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62475&action=edit
Code demonstrating issue
Recently encountered a noticeable difference between execution time of an
executable generated with GCC and Clang.
Some details:
For optimisations O1..O2 : Both GCC and Clang produce similar results in terms
of execution time.
O3: Clang produces ~+3x speed up over GCC
Observed in the following versions:
GCC 11..15
Clang: 14..22
Note1: Left c++ standard as whatever the default the compiler assumed.
Note2: Did the same runs, but this time included -march=native similar results
for archs i7,i9.
---- code ----
#include <chrono>
#include <cmath>
#include <cstdio>
#include <cstdint>
inline bool is_prime(std::uint32_t i)
{
if (i == 2) return true;
if ((i == 1) || (i % 2 == 0)) return false;
const auto c = static_cast<std::uint32_t>(std::ceil(std::sqrt(i)));
//const auto c = (i / 2) + 1;
for (std::uint64_t j = 3; j <= c; j += 2)
{
if (0 == (i % j)) return false;
}
return true;
}
int main()
{
constexpr std::uint32_t search_begin = 1;
constexpr std::uint32_t search_end = 400000;
std::uint64_t prime_count = 0;
const auto start = std::chrono::steady_clock::now();
for (std::uint32_t i = search_begin; i <= search_end; i++)
{
if (is_prime(i))
{
++prime_count;
}
}
const auto end = std::chrono::steady_clock::now();
printf("total time:
%lums\n",std::chrono::duration_cast<std::chrono::milliseconds>(end -
start).count());
printf("number of primes: %lu\n", prime_count);
return 0;
}
---- code ----
Note3: Tried replacing the upper bound limit calc with a simple halving of the
range, thinking it could have something to do with sqr/ceil, but there was no
change, the relative difference still remained.
Given the simple nature of the code, I did not expect there to be such a large
difference in performance between the two compilers.