On Tue, Jun 13, 2017 at 8:47 AM, Martin Peres <[email protected]> wrote: > > > On 13/06/17 15:43, Ilia Mirkin wrote: >> >> On Tue, Jun 13, 2017 at 8:18 AM, Roland Scheidegger <[email protected]> >> wrote: >>> >>> Am 13.06.2017 um 08:57 schrieb Karol Herbst: >>>> >>>> On Tue, Jun 13, 2017 at 2:17 AM, Roland Scheidegger <[email protected]> >>>> wrote: >>>>> >>>>> I am actually also thinking this should be different. >>>>> >>>>> e.g. imho MAD means the operation can be either fused or unfused. >>>>> This is the "traditional" definition of MAD - opencl for instance will >>>>> follow this too, albeit this isn't mentioned in the gallium docs (it >>>>> probably should be). >>>>> (OpenCL says: "Whether or how the product of a * b is rounded and how >>>>> supernormal or subnormal intermediate products are handled is not >>>>> defined. mad is intended to be used where speed is preferred over >>>>> accuracy.") >>>>> I think doing something different here in gallium can only lead to >>>>> madness long term - glsl doesn't have mad in the first place, and as >>>>> far >>>>> as I can tell d3d10 is ok with fused/unfused mad too (the docs stating >>>>> "Fused operations (such as mad, dp3) produce results that are no less >>>>> accurate than the worst possible serial ordering of evaluation of the >>>>> unfused expansion of the operation.") >>>>> >>>>> This means that mul+add cannot be fused anywhere to a mad if precise is >>>>> specified, and therefore you should never have to worry about doing a >>>>> fused or unfused mul/add in the driver with a mad - it's enough if you >>>>> just don't fuse mul+add in the driver itself (if you can't do unfused >>>>> mad). >>>>> >>>>> Roland >>>>> >>>> >>>> well there is a TGSI peephole doing this mul+add=>mad optimisation, >>>> because it isn't wrong, because mad != fma and mul+add==mad, but on >>>> Fermi+ Nvidia hardware there is no mad, only fma and because mad != fma, >>>> we need to split it up again. >>>> >>>> So either TGSI doesn't merge it if the Instruction is flagged as precise >>>> (which >>>> it is in those tests mentioned) allthough it is correct, or we lower >>>> something in >>>> the driver, because the Instruction isn't supported by the hardware all >>>> along. >>> >>> >>> Yes, I think the TGSI peephole shouldn't merge mul+add to mad with >>> precise. You say this isn't wrong, but imho it clearly is, because noone >>> ever said MAD can't be a fused add - it is multiply + add, yes, but if >>> there's intermediate rounding or not isn't specified. FWIW gallivm code >>> also assumes this, and will use llvm.fmuladd for implementation (which >>> is exactly the same "mul+add" story as opencl mad, and will use fma on >>> cpus supporting it and separate mul+add otherwise, save some bugs in >>> older llvm versions apparently). >>> So we should just clarify that in the tgsi docs - mad is multiply + add, >>> with undefined intermediate rounding, it can be a fused mul+add or an >>> unfused one (technically it could also be something in-between I suppose >>> since the apis just specify the accuracy isn't worse than a unfused >>> multiply + add). Every driver gets to use what it can do fastest for it, >>> and because there's no specified intermediate rounding for it, precise >>> doesn't change anything there. >>> >>> That's at least my opinion what TGSI_OPCODE_MAD should be (of course, >>> older gpus always used unfused mad, but this wasn't a requirement). >> >> >> BTW, irrespective of how this conversation turns out, I think it's a >> good idea to split MAD into mul + add in the nv50 backend on input, >> unconditionally. > > > I seem to remember that using MAD introduced a performance regression on my > nv86 for some benchmarks. I will need to get the setup working again for > mesa testing.
It did, but on Fermi or Kepler, I thought. Using IMAD is apparently not a great idea. But that's entirely separate from what's in the TGSI. _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
