https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124001
Bug ID: 124001
Summary: wrong code generated with alignas() on armv8
Product: gcc
Version: 15.2.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: Joost.VandeVondele at mat dot ethz.ch
Target Milestone: ---
We have observed incorrect results with our software package Stockfish when
compiled for armv8 and gcc 15.2. The same target but previous gcc version (at
least gcc 10 - gcc 14, as well as all versions of clang) result in the expected
results.
To reproduce the wrong result the following works:
git clone https://github.com/official-stockfish/Stockfish.git
cd Stockfish/src/
git checkout sf_18
make clean && make -j build ARCH=armv8 COMP=gcc COMPCXX=g++-15
for i in `seq 1 3`; do ./stockfish bench 2>&1 | grep 'Nodes searched' ; done
A sample wrong output of this last line is (changes randomly).
Nodes searched : 2710667
Nodes searched : 2420746
Nodes searched : 2464947
While correct e.g. with g++-12
Nodes searched : 2050811
Nodes searched : 2050811
Nodes searched : 2050811
Which suggests some uninitialized memory is accessed. Note that older version
of gcc warn on this target, but do not produce wrong results:
In member function ‘propagate’,
inlined from ‘propagate’ at nnue/nnue_architecture.h:124:23,
inlined from ‘evaluate’ at nnue/network.cpp:187:54:
nnue/layers/affine_transform_sparse_input.h:342:39: warning:
‘transformedFeatures’ may be used uninitialized [-Wmaybe-uninitialized]
342 | const invec_t in = vec_set_32(input32[i]);
| ^
nnue/network.cpp: In member function ‘evaluate’:
nnue/network.cpp:180:30: note: ‘transformedFeatures’ declared here
180 | TransformedFeatureType
transformedFeatures[FeatureTransformer<FTDimensions>::BufferSize];
| ^
We can fix this in the Stockfish code with the following diff:
diff --git a/src/nnue/network.cpp b/src/nnue/network.cpp
index d1f2b14c..115773d7 100644
--- a/src/nnue/network.cpp
+++ b/src/nnue/network.cpp
@@ -35,6 +35,7 @@
#include "nnue_architecture.h"
#include "nnue_common.h"
#include "nnue_misc.h"
+#include "../memory.h"
// Macro to embed the default efficiently updatable neural network (NNUE) file
// data in the engine binary (using incbin.h, by Dale Weiler).
@@ -176,8 +177,11 @@ Network<Arch, Transformer>::evaluate(const Position&
pos
constexpr uint64_t alignment = CacheLineSize;
- alignas(alignment)
- TransformedFeatureType
transformedFeatures[FeatureTransformer<FTDimensions>::BufferSize];
+ TransformedFeatureType
+
transformedFeaturesUnaligned[FeatureTransformer<FTDimensions>::BufferSize
+ + alignment /
sizeof(TransformedFeatureType)];
+
+ auto* transformedFeatures =
align_ptr_up<alignment>(&transformedFeaturesUnaligned[0]);
ASSERT_ALIGNED(transformedFeatures, alignment);
I.e. where we explicitly replace the alignas(alignment) with align_ptr_up.
We note here that we align to CacheLineSize which we defined as 64, i.e. more
than __BIGGEST_ALIGNMENT__ (which is 16 on this target).
However, we believe this is common practice for SIMD code, and the
implementation of gcc seemed to honor this so far.