https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124001

            Bug ID: 124001
           Summary: wrong code generated with alignas() on armv8
           Product: gcc
           Version: 15.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: Joost.VandeVondele at mat dot ethz.ch
  Target Milestone: ---

We have observed incorrect results with our software package Stockfish when
compiled for armv8 and gcc 15.2. The same target but previous gcc version (at
least gcc 10 - gcc 14, as well as all versions of clang) result in the expected
results.

To reproduce the wrong result the following works:

git clone https://github.com/official-stockfish/Stockfish.git
cd Stockfish/src/
git checkout sf_18
make clean && make -j build ARCH=armv8 COMP=gcc COMPCXX=g++-15
for i in `seq 1 3`; do ./stockfish bench 2>&1 | grep 'Nodes searched' ; done

A sample wrong output of this last line is (changes randomly).
Nodes searched  : 2710667
Nodes searched  : 2420746
Nodes searched  : 2464947

While correct e.g. with g++-12
Nodes searched  : 2050811
Nodes searched  : 2050811
Nodes searched  : 2050811

Which suggests some uninitialized memory is accessed. Note that older version
of gcc warn on this target, but do not produce wrong results:

In member function ‘propagate’,
    inlined from ‘propagate’ at nnue/nnue_architecture.h:124:23,
    inlined from ‘evaluate’ at nnue/network.cpp:187:54:
nnue/layers/affine_transform_sparse_input.h:342:39: warning:
‘transformedFeatures’ may be used uninitialized [-Wmaybe-uninitialized]
  342 |             const invec_t        in = vec_set_32(input32[i]);
      |                                       ^
nnue/network.cpp: In member function ‘evaluate’:
nnue/network.cpp:180:30: note: ‘transformedFeatures’ declared here
  180 |       TransformedFeatureType
transformedFeatures[FeatureTransformer<FTDimensions>::BufferSize];
      |                              ^

We can fix this in the Stockfish code with the following diff:

diff --git a/src/nnue/network.cpp b/src/nnue/network.cpp
index d1f2b14c..115773d7 100644
--- a/src/nnue/network.cpp
+++ b/src/nnue/network.cpp
@@ -35,6 +35,7 @@
 #include "nnue_architecture.h"
 #include "nnue_common.h"
 #include "nnue_misc.h"
+#include "../memory.h"

 // Macro to embed the default efficiently updatable neural network (NNUE) file
 // data in the engine binary (using incbin.h, by Dale Weiler).
@@ -176,8 +177,11 @@ Network<Arch, Transformer>::evaluate(const Position&      
                  pos

     constexpr uint64_t alignment = CacheLineSize;

-    alignas(alignment)
-      TransformedFeatureType
transformedFeatures[FeatureTransformer<FTDimensions>::BufferSize];
+    TransformedFeatureType
+     
transformedFeaturesUnaligned[FeatureTransformer<FTDimensions>::BufferSize
+                                   + alignment /
sizeof(TransformedFeatureType)];
+
+    auto* transformedFeatures =
align_ptr_up<alignment>(&transformedFeaturesUnaligned[0]);

     ASSERT_ALIGNED(transformedFeatures, alignment);

I.e. where we explicitly replace the alignas(alignment) with align_ptr_up.

We note here that we align to CacheLineSize which we defined as 64, i.e. more
than __BIGGEST_ALIGNMENT__ (which is 16 on this target).

However, we believe this is common practice for SIMD code, and the
implementation of gcc seemed to honor this so far.

Reply via email to