Am 06.10.2017 um 11:29 schrieb Alex Smith: > On 6 October 2017 at 03:39, Dave Airlie <airl...@gmail.com > <mailto:airl...@gmail.com>> wrote: > > On 6 October 2017 at 12:31, Marek Olšák <mar...@gmail.com > <mailto:mar...@gmail.com>> wrote: > > On Fri, Oct 6, 2017 at 4:10 AM, Connor Abbott <cwabbo...@gmail.com > <mailto:cwabbo...@gmail.com>> wrote: > >> On Thu, Oct 5, 2017 at 10:08 PM, Marek Olšák <mar...@gmail.com > <mailto:mar...@gmail.com>> wrote: > >>> On Fri, Oct 6, 2017 at 3:50 AM, Connor Abbott <cwabbo...@gmail.com > <mailto:cwabbo...@gmail.com>> wrote: > >>>> Why? While it might technically be legal, always generating an > unfused > >>>> mul+add when the user explicitly requested fma() seems harsh... > >>> > >>> It's slow on some chips. It doesn't need any other reason. > >>> > >>> Marek > >> > >> Presumably, if the developer asked for fma, then they don't care how > >> fast or slow it is... > > > > Feral asked for fma. They care. This debate is pointless. We just > > won't use fma by default. Period. > > They didn't ask for it with precise precision. I'm assuming if > someone wants > fma with precise precision we should give it to them. Like at least > the fma manpage states. > > https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml > <https://www.khronos.org/registry/OpenGL-Refpages/gl4/html/fma.xhtml> > > > Some of our older games (e.g. Tomb Raider) do actually request precise > (based on what the original D3D shader asks for), so changing the > behaviour on GL to use the proper fma would likely regress performance > on those. > > D3D's mad (which we've been using fma to implement) is similarly vague > as GLSL about what the actual precision requirements are with > precise: > https://msdn.microsoft.com/en-us/library/windows/desktop/ff471418(v=vs.85).aspx > <https://msdn.microsoft.com/en-us/library/windows/desktop/ff471418%28v=vs.85%29.aspx>
Of course, but d3d mad is a "traditional" multiply/add which predates fully programmable shader pipelines even, and back in the days gpus actually used fixed function alus where talking about "fused" didn't even make sense. I think the problem here is just that glsl never had such a mad - because being based on textual representation, mul and add use operators, and a mad function just would look ugly (and generally with glsl lax requirements, noone ever would care if you actually fuse muls and adds). But now with precise, you cannot fuse such separate muls and adds freely, because the compiler can't guarantee you it will always fuse them (and it would be shady in any case). Thus using separate muls and adds would penalize gpus which can only do fused mul+add in a single step (nvidia IIRC, also x86 avx with fma). Hence "fma" being added. I would, however, say that calling this "fma" is a very serious (but unfixable now) spec bug. Noone ever talks about a "fused multiply add" when it actually may as well be unfused. This is just confusing as hell. Call it mad, fmuladd (as llvm does), mfma ("maybe fused"...) or whatever, but not fma. (fwiw d3d is sane there - single may be fused or unfused, and it's called mad, with doubles it is guaranteed to always be fused, and it's called dfma accordingly.) And fwiw I got confused by this too earlier, thinking it has to be fused - certainly opencl etc. really want to use a fused one if they use fma. This also means I was wrong earlier when there were some problems with fma / mad on nouveau drivers - since fma can apparently be unfused, there's no point for the mesa state tracker to ever use the tgsi fma opcode, and it should always use MAD instead as far as I can tell (but of course setting the precise bit accordingly). Roland _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev