Hi all, As mentioned in previous ACEv1 thread, we implemented it with a different way. We used internal pattern instead of inline assmebly by treating tmm as a fake register. Users still need to manage the register allocation for now by passing register number, but this will help compiler know the dependency between insts and maybe more convenient for potential future tmm register allocation since all the patterns are there for reference. Since in ACE, tmms are accumulator unit rather than calculation unit, this is acceptable.
Also since ACE and legacy AMX should not be used together, we use different intrin name for those shared insts. See patch detail description for how we consider those names. Bootstrapped and regtested on x86_64-pc-linux-gnu. Discussions are welcomed on this patch series. Thx, Haochen
