On Mon, Mar 29, 2021 at 2:16 PM Andy Lutomirski <l...@amacapital.net> wrote:
>
>
> > On Mar 29, 2021, at 8:47 AM, Len Brown <l...@kernel.org> wrote:
> >
> > On Sat, Mar 27, 2021 at 5:58 AM Greg KH <gre...@linuxfoundation.org> wrote:
> >>> On Fri, Mar 26, 2021 at 11:39:18PM -0400, Len Brown wrote:
> >>> Hi Andy,
> >>> Say a mainline links with a math library that uses AMX without the
> >>> knowledge of the mainline.
> >
> > sorry for the confusion.
> >
> > mainline = main().
> >
> > ie. the part of the program written by you, and not the library you linked 
> > with.
> >
> > In particular, the library may use instructions that main() doesn't know 
> > exist.
>
> If we pretend for a bit that AMX were a separate device instead of a part of 
> the CPU, this would be a no brainer: something would be responsible for 
> opening a device node or otherwise requesting access to the device.
>
> Real AMX isn’t so different. Programs acquire access either by syscall or by 
> a fault, they use it, and (hopefully) they release it again using 
> TILERELEASE. The only thing special about it is that, supposedly, acquiring 
> and releasing access (at least after the first time) is quite fast.  But 
> holding access is *not* free — despite all your assertions to the contrary, 
> the kernel *will* correctly context switch it to avoid blowing up power 
> consumption, and this will have overhead.
>
> We’ve seen the pattern of programs thinking that, just because something is a 
> CPU insn, it’s free and no thought is needed before using it. This happened 
> with AVX and AVX512, and it will happen again with AMX. We *still* have a 
> significant performance regression in the kernel due to screwing up the AVX 
> state machine, and the only way I know about any of the details is that I 
> wrote silly test programs to try to reverse engineer the nonsensical behavior 
> of the CPUs.
>
> I might believe that Intel has figured out how to make a well behaved XSTATE 
> feature after Intel demonstrates at least once that it’s possible.  That 
> means full documentation of all the weird issues, no new special cases, and 
> the feature actually making sense in the context of XSTATE.  This has not 
> happened.  Let’s list all of them:
>
> - SSE.  Look for all the MXCSR special cases in the pseudocode and tell me 
> with a straight face that this one works sensibly.
>
> - AVX.  Also has special cases in the pseudocode. And has transition issues 
> that are still problems and still not fully documented. L
>
> - AVX2.  Horrible undocumented performance issues.  Otherwise maybe okay?
>
> - MPX: maybe the best example, but the compat mode part got flubbed and it’s 
> MPX.
>
> - PKRU: Should never have been in XSTATE. (Also, having WRPKRU in the ISA was 
> a major mistake, now unfixable, that seriously limits the usefulness of the 
> whole feature.  I suppose Intel could release PKRU2 with a better ISA and 
> deprecate the original PKRU, but I’m not holding my breath.)
>
> - AVX512: Yet more uarch-dependent horrible performance issues, and Intel has 
> still not responded about documentation.  The web is full of people 
> speculating differently about when, exactly, using AVX512 breaks performance. 
> This is NAKked in kernel until docs arrive. Also, it broke old user programs. 
>  If we had noticed a few years ago, AVX512 enablement would have been 
> reverted.
>
> - AMX: This mess.
>
> The current system of automatic user enablement does not work. We need 
> something better.

Hi Andy,

Can you provide a concise definition of the exact problemI(s) this thread
is attempting to address?

Thank ahead-of-time for excluding "blow up power consumption",
since that paranoia is not grounded in fact.

thanks,
-Len

Reply via email to