https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122682
Bug ID: 122682
Summary: Comparing perf between gcc and clang for a 4x4 matrix
multiply
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: peter0x44 at disroot dot org
Target Milestone: ---
Created attachment 62798
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=62798&action=edit
Benchmark of a 4x4 float matrix multiply.
I have attached a benchmark script of a 4x4 matrix multiply function. It
compiles this function with both gcc and clang, and then proceeds to run both
of the versions through a test harness.
Download the attachment, and run it with sh:
sh benchmark_4x4_matmul.c
Now, here are the results of discrepancies on some of the hardware I tested, or
that my friends + online strangers gave me test results for.
Intel i7 9700:
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 21.1.5
=== Results ===
GCC version:
Total time: 26.138 seconds
Time per operation: 2.61 ns
Clang version:
Total time: 23.925 seconds
Time per operation: 2.39 ns
Clang is 1.09x faster (9.2% faster)
```
Intel i7 10700k:
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 20.1.8
=== Results ===
GCC version:
Total time: 25.402 seconds
Time per operation: 2.54 ns
Clang version:
Total time: 23.176 seconds
Time per operation: 2.32 ns
Clang is 1.10x faster (9.6% faster)
```
Intel(R) Celeron(R) N4120
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 21.1.5
Clang is 1.14x faster (14.3% faster)
```
For the Intel CPUs I tested, gcc seemed consistently a little slower than
recent clang versions. However, there were CPUs where gcc was outperforming
clang by a huge margins too. It seemed to mostly be the case for AMD:
Ryzen 9 9950x:
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 21.1.4
GCC is 1.43x faster (42.7% faster)
```
Ryzen 7 8840U
```
GCC version: gcc (GCC) 15.2.1 20251112
Clang version: clang version 21.1.5
GCC is 1.90x faster (89.7% faster)
However, there were also examples where gcc and clang were almost identical:
Ryzen 9 5950x:
```
GCC version: gcc (GCC) 15.2.1 20250813
Clang version: clang version 21.1.5
GCC is 1.01x faster (0.5% faster)
```
I also have results with older gcc and clang versions, but I haven't mentioned
them, because they are less relevant to today. I will make another attachment
with those results for the curious.
Please run this benchmark for yourself and provide results too.