https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85253
--- Comment #3 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- Yep, looking at the code, it seems that in this special case, we need one more row in the temporary buffer. This seems to cure it. Index: m4/matmul_internal.m4 =================================================================== --- m4/matmul_internal.m4 (Revision 259152) +++ m4/matmul_internal.m4 (Arbeitskopie) @@ -234,7 +234,7 @@ sinclude(`matmul_asm_'rtype_code`.m4')dnl /* Adjust size of t1 to what is needed. */ index_type t1_dim; - t1_dim = (a_dim1-1) * 256 + b_dim1; + t1_dim = (a_dim1- (ycount > 1)) * 256 + b_dim1; if (t1_dim > 65536) t1_dim = 65536;