https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103548
Bug ID: 103548 Summary: Identical MMA assemble quads are incorrectly combined Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: bergner at gcc dot gnu.org Target Milestone: --- We incorrectly combine multiple identical build/assemble quads/accs leading to incorrect assembly being generated: typedef unsigned char vec_t __attribute__((vector_size(16))); void foo (__vector_quad *dst, vec_t *src) { __vector_quad quad0, quad1; /* Adjacent loads should be combined into two lxvp instructions. and identical build accs should not be combined. */ __builtin_mma_build_acc (&quad0, src[0], src[1], src[2], src[3]); __builtin_mma_build_acc (&quad1, src[0], src[1], src[2], src[3]); dst[0] = quad0; dst[2] = quad1; } ...gives: lxv 3,0(4) lxv 2,16(4) lxv 1,32(4) lxv 0,48(4) xxmtacc 0 xxmfacc 0 stxvp 2,0(3) stxvp 0,32(3) xxmfacc 0 stxvp 2,128(3) stxvp 0,160(3) blr Notive we only have 4 loads and 1 xxmtacc but 2 xxmfacc. This is incorrect. I have a patch I'm testing. Note that for build/assemble pair, we are allowed to combine identical calls...amd we do.