On Mon, Mar 29, 2021 at 2:16 PM Andy Lutomirski <l...@amacapital.net> wrote: > > > > On Mar 29, 2021, at 8:47 AM, Len Brown <l...@kernel.org> wrote: > > > > On Sat, Mar 27, 2021 at 5:58 AM Greg KH <gre...@linuxfoundation.org> wrote: > >>> On Fri, Mar 26, 2021 at 11:39:18PM -0400, Len Brown wrote: > >>> Hi Andy, > >>> Say a mainline links with a math library that uses AMX without the > >>> knowledge of the mainline. > > > > sorry for the confusion. > > > > mainline = main(). > > > > ie. the part of the program written by you, and not the library you linked > > with. > > > > In particular, the library may use instructions that main() doesn't know > > exist. > > If we pretend for a bit that AMX were a separate device instead of a part of > the CPU, this would be a no brainer: something would be responsible for > opening a device node or otherwise requesting access to the device. > > Real AMX isn’t so different. Programs acquire access either by syscall or by > a fault, they use it, and (hopefully) they release it again using > TILERELEASE. The only thing special about it is that, supposedly, acquiring > and releasing access (at least after the first time) is quite fast. But > holding access is *not* free — despite all your assertions to the contrary, > the kernel *will* correctly context switch it to avoid blowing up power > consumption, and this will have overhead. > > We’ve seen the pattern of programs thinking that, just because something is a > CPU insn, it’s free and no thought is needed before using it. This happened > with AVX and AVX512, and it will happen again with AMX. We *still* have a > significant performance regression in the kernel due to screwing up the AVX > state machine, and the only way I know about any of the details is that I > wrote silly test programs to try to reverse engineer the nonsensical behavior > of the CPUs. > > I might believe that Intel has figured out how to make a well behaved XSTATE > feature after Intel demonstrates at least once that it’s possible. That > means full documentation of all the weird issues, no new special cases, and > the feature actually making sense in the context of XSTATE. This has not > happened. Let’s list all of them: > > - SSE. Look for all the MXCSR special cases in the pseudocode and tell me > with a straight face that this one works sensibly. > > - AVX. Also has special cases in the pseudocode. And has transition issues > that are still problems and still not fully documented. L > > - AVX2. Horrible undocumented performance issues. Otherwise maybe okay? > > - MPX: maybe the best example, but the compat mode part got flubbed and it’s > MPX. > > - PKRU: Should never have been in XSTATE. (Also, having WRPKRU in the ISA was > a major mistake, now unfixable, that seriously limits the usefulness of the > whole feature. I suppose Intel could release PKRU2 with a better ISA and > deprecate the original PKRU, but I’m not holding my breath.) > > - AVX512: Yet more uarch-dependent horrible performance issues, and Intel has > still not responded about documentation. The web is full of people > speculating differently about when, exactly, using AVX512 breaks performance. > This is NAKked in kernel until docs arrive. Also, it broke old user programs. > If we had noticed a few years ago, AVX512 enablement would have been > reverted. > > - AMX: This mess. > > The current system of automatic user enablement does not work. We need > something better.
Hi Andy, Can you provide a concise definition of the exact problemI(s) this thread is attempting to address? Thank ahead-of-time for excluding "blow up power consumption", since that paranoia is not grounded in fact. thanks, -Len