https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122682

            Bug ID: 122682
           Summary: Comparing perf between gcc and clang for a 4x4 matrix
                    multiply
           Product: gcc
           Version: 16.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: peter0x44 at disroot dot org
  Target Milestone: ---

Created attachment 62798
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62798&action=edit
Benchmark of a 4x4 float matrix multiply.

I have attached a benchmark script of a 4x4 matrix multiply function. It
compiles this function with both gcc and clang, and then proceeds to run both
of the versions through a test harness.
Download the attachment, and run it with sh:
sh benchmark_4x4_matmul.c

Now, here are the results of discrepancies on some of the hardware I tested, or
that my friends + online strangers gave me test results for.

Intel i7 9700:
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 21.1.5

=== Results ===
GCC version:
  Total time: 26.138 seconds
  Time per operation: 2.61 ns

Clang version:
  Total time: 23.925 seconds
  Time per operation: 2.39 ns

Clang is 1.09x faster (9.2% faster)
```

Intel i7 10700k:
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 20.1.8

=== Results ===
GCC version:
  Total time: 25.402 seconds
  Time per operation: 2.54 ns

Clang version:
  Total time: 23.176 seconds
  Time per operation: 2.32 ns

Clang is 1.10x faster (9.6% faster)
```

Intel(R) Celeron(R) N4120
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 21.1.5
Clang is 1.14x faster (14.3% faster)
```


For the Intel CPUs I tested, gcc seemed consistently a little slower than
recent clang versions. However, there were CPUs where gcc was outperforming
clang by a huge margins too. It seemed to mostly be the case for AMD:

Ryzen 9 9950x:
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 21.1.4

GCC is 1.43x faster (42.7% faster)
```

Ryzen 7 8840U
```
GCC version: gcc (GCC) 15.2.1 20251112
Clang version: clang version 21.1.5

GCC is 1.90x faster (89.7% faster)

However, there were also examples where gcc and clang were almost identical:

Ryzen 9 5950x:
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 21.1.5

GCC is 1.01x faster (0.5% faster)
```


I also have results with older gcc and clang versions, but I haven't mentioned
them, because they are less relevant to today. I will make another attachment
with those results for the curious.

Please run this benchmark for yourself and provide results too.

Reply via email to