On Tue, Jun 13, 2017 at 8:18 AM, Roland Scheidegger <[email protected]> wrote: > Am 13.06.2017 um 08:57 schrieb Karol Herbst: >> On Tue, Jun 13, 2017 at 2:17 AM, Roland Scheidegger <[email protected]> >> wrote: >>> I am actually also thinking this should be different. >>> >>> e.g. imho MAD means the operation can be either fused or unfused. >>> This is the "traditional" definition of MAD - opencl for instance will >>> follow this too, albeit this isn't mentioned in the gallium docs (it >>> probably should be). >>> (OpenCL says: "Whether or how the product of a * b is rounded and how >>> supernormal or subnormal intermediate products are handled is not >>> defined. mad is intended to be used where speed is preferred over >>> accuracy.") >>> I think doing something different here in gallium can only lead to >>> madness long term - glsl doesn't have mad in the first place, and as far >>> as I can tell d3d10 is ok with fused/unfused mad too (the docs stating >>> "Fused operations (such as mad, dp3) produce results that are no less >>> accurate than the worst possible serial ordering of evaluation of the >>> unfused expansion of the operation.") >>> >>> This means that mul+add cannot be fused anywhere to a mad if precise is >>> specified, and therefore you should never have to worry about doing a >>> fused or unfused mul/add in the driver with a mad - it's enough if you >>> just don't fuse mul+add in the driver itself (if you can't do unfused mad). >>> >>> Roland >>> >> >> well there is a TGSI peephole doing this mul+add=>mad optimisation, >> because it isn't wrong, because mad != fma and mul+add==mad, but on >> Fermi+ Nvidia hardware there is no mad, only fma and because mad != fma, >> we need to split it up again. >> >> So either TGSI doesn't merge it if the Instruction is flagged as precise >> (which >> it is in those tests mentioned) allthough it is correct, or we lower >> something in >> the driver, because the Instruction isn't supported by the hardware all >> along. > > Yes, I think the TGSI peephole shouldn't merge mul+add to mad with > precise. You say this isn't wrong, but imho it clearly is, because noone > ever said MAD can't be a fused add - it is multiply + add, yes, but if > there's intermediate rounding or not isn't specified. FWIW gallivm code > also assumes this, and will use llvm.fmuladd for implementation (which > is exactly the same "mul+add" story as opencl mad, and will use fma on > cpus supporting it and separate mul+add otherwise, save some bugs in > older llvm versions apparently). > So we should just clarify that in the tgsi docs - mad is multiply + add, > with undefined intermediate rounding, it can be a fused mul+add or an > unfused one (technically it could also be something in-between I suppose > since the apis just specify the accuracy isn't worse than a unfused > multiply + add). Every driver gets to use what it can do fastest for it, > and because there's no specified intermediate rounding for it, precise > doesn't change anything there. > > That's at least my opinion what TGSI_OPCODE_MAD should be (of course, > older gpus always used unfused mad, but this wasn't a requirement).
BTW, irrespective of how this conversation turns out, I think it's a good idea to split MAD into mul + add in the nv50 backend on input, unconditionally. Cheers, -ilia _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
