https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70855
Jakub Jelinek <jakub at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Created attachment 38468 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=38468&action=edit gcc7-pr70855.patch Untested simple fix (that is backportable too). If we want to parallelize this, I'd say the right thing would be still to disable the inlining during frontend passes when in omp workshare, make the inline_matmul_assign function no longer static and during omp workshare translation call that with some special arguments that would arrange for it to be properly parallelized. We'd need to ensure that the c = 0 clearing is split to threads the same way as the following loop, and that each entry in the c array is only set and modified in the same thread.