Am 13.06.2017 um 02:05 schrieb Ilia Mirkin: > On Mon, Jun 12, 2017 at 7:57 PM, Roland Scheidegger <[email protected]> > wrote: >> FWIW surely on nv50 you could keep a single mad instruction for umad >> (sad maybe too?). (I'm actually wondering if the hw really can't do >> unfused float multiply+add as a single instruction but I know next to >> nothing about nvidia hw...) > > The compiler should reassociate a mul + add into a mad where possible. > In actuality, IMAD is actually super-slow... allegedly slower than > IMUL + IADD. Not sure why. Maxwell added a XMAD operation which is > faster but we haven't figured out how to operate it yet. I'm not aware > of a muladd version of fma on fermi and newer (GL 4.0). The tesla > series does have a floating point mul+add (but no fma). >
Interesting. radeons seem to always have a unfused mad. pre-gcn parts apparently only have a 32bit fma with parts supporting double precision. The same restriction is stated for gcn parts in the isa docs, which obviously doesn't make sense, but I have no idea if the fma is full speed... Roland _______________________________________________ Nouveau mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/nouveau
