https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122101

            Bug ID: 122101
           Summary: Large performance difference between GCC and Clang
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: arash at partow dot net
  Target Milestone: ---

Created attachment 62475
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62475&action=edit
Code demonstrating issue

Recently encountered a noticeable difference between execution time of an
executable generated with GCC and Clang.

Some details:

For optimisations O1..O2 : Both GCC and Clang produce similar results in terms
of execution time.

O3: Clang produces ~+3x speed up over GCC

Observed in the following versions:

GCC 11..15
Clang: 14..22

Note1: Left c++ standard as whatever the default the compiler assumed.
Note2: Did the same runs, but this time included -march=native similar results
for archs i7,i9.


---- code ----

#include <chrono>
#include <cmath>
#include <cstdio>
#include <cstdint>

inline bool is_prime(std::uint32_t i)
{
    if (i == 2) return true;
    if ((i == 1) || (i % 2 == 0)) return false;

    const auto c = static_cast<std::uint32_t>(std::ceil(std::sqrt(i)));
    //const auto c = (i / 2) + 1;

    for (std::uint64_t j = 3; j <= c; j += 2)
    {
        if (0 == (i % j)) return false;
    }

    return true;
}

int main()
{
    constexpr std::uint32_t search_begin = 1;
    constexpr std::uint32_t search_end   = 400000;

    std::uint64_t prime_count = 0;

    const auto start = std::chrono::steady_clock::now();

    for (std::uint32_t i = search_begin; i <= search_end; i++)
    {
        if (is_prime(i))
        {
            ++prime_count;
        }
    }

    const auto end = std::chrono::steady_clock::now();

    printf("total time:
%lums\n",std::chrono::duration_cast<std::chrono::milliseconds>(end -
start).count());
    printf("number of primes: %lu\n", prime_count);

    return 0;
}

---- code ----


Note3: Tried replacing the upper bound limit calc with a simple halving of the
range, thinking it could have something to do with sqr/ceil, but there was no
change, the relative difference still remained.

Given the simple nature of the code, I did not expect there to be such a large
difference in performance between the two compilers.

Reply via email to