Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

Jose Fonseca Thu, 02 May 2013 21:59:26 -0700


----- Original Message -----
> Currently, there's no way to get the high bits of a 32x32
> signed/unsigned integer multiplication with tgsi.
> However, all of d3d10, OpenGL, and OpenCL support that, so we need it as
> well.
> There's essentially two ways how it could be done:
> - a 2-destination instruction returning both high and low bits (this is
> how it looks like in d3d10 and glsl)
> - use the existing umul for the low bits and have another instruction
> for the high bits (this is how it looks like in opencl)
> 
> Well there's other possibilities but these looked like they'd match both
> APIs and HW reasonably (well with the exception of things like sse2
> which would prefer 2x2 32bit inputs and return 2x64bit as one reg...).
> 
> Actually it's two new instructions because unlike for the low bits it
> matters for the high bits if the source operands are signed or unsigned.
> 
> Personally I'm favoring two separate instructions for low and high bits
> to not have to deal with multi-destination instructions, but if someone
> makes a strong case for one returning both low and high bits I could be
> convinced otherwise. I think though two instructions matches most hw
> very well (with the exception of software renderers and possibly intel
> graphics but then a good backend could certainly recognize this).


Roland,

I don't know about GPU HW, but I think that what you propose will forever 
prevent decent SSE code generation with LLVM.

Using two separate opcodes for hi/low bits relies on common sub-expression 
elimination to merge the two multiplication operations back into one.  But I 
strongly doubt that even LLVM's optimization passes will be able to do that.

Getting the 64bits results with LLVM will require sign extend the source 
arguments (http://llvm.org/docs/LangRef.html#mul-instruction ) or SSE 
intrinsics. Eitherway, the expressions for the low and high bit will be 
radically different, so we'll end with two multiplies in the end -- which I 
think it is simply inadmissible -- TGSI should not stand in the way of backends 
generating good code.

So I strongly think this is a bad idea. TGSI has support for multiple 
destinations, though we never made much use of it. I see nothing special about 
it.

If you can prove me wrong -- that LLVM can handle merge the multiplies -- fine. 
 But I do think we have bigger fish to fry, so I'd prefer we don't put too much 
time debating this.

Jose
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] RFC: tgsi opcodes for 32x32 muls with 64bit results

Reply via email to