----- Original Message ----- > Currently, there's no way to get the high bits of a 32x32 > signed/unsigned integer multiplication with tgsi. > However, all of d3d10, OpenGL, and OpenCL support that, so we need it as > well. > There's essentially two ways how it could be done: > - a 2-destination instruction returning both high and low bits (this is > how it looks like in d3d10 and glsl) > - use the existing umul for the low bits and have another instruction > for the high bits (this is how it looks like in opencl) > > Well there's other possibilities but these looked like they'd match both > APIs and HW reasonably (well with the exception of things like sse2 > which would prefer 2x2 32bit inputs and return 2x64bit as one reg...). > > Actually it's two new instructions because unlike for the low bits it > matters for the high bits if the source operands are signed or unsigned. > > Personally I'm favoring two separate instructions for low and high bits > to not have to deal with multi-destination instructions, but if someone > makes a strong case for one returning both low and high bits I could be > convinced otherwise. I think though two instructions matches most hw > very well (with the exception of software renderers and possibly intel > graphics but then a good backend could certainly recognize this).
Roland, I don't know about GPU HW, but I think that what you propose will forever prevent decent SSE code generation with LLVM. Using two separate opcodes for hi/low bits relies on common sub-expression elimination to merge the two multiplication operations back into one. But I strongly doubt that even LLVM's optimization passes will be able to do that. Getting the 64bits results with LLVM will require sign extend the source arguments (http://llvm.org/docs/LangRef.html#mul-instruction ) or SSE intrinsics. Eitherway, the expressions for the low and high bit will be radically different, so we'll end with two multiplies in the end -- which I think it is simply inadmissible -- TGSI should not stand in the way of backends generating good code. So I strongly think this is a bad idea. TGSI has support for multiple destinations, though we never made much use of it. I see nothing special about it. If you can prove me wrong -- that LLVM can handle merge the multiplies -- fine. But I do think we have bigger fish to fry, so I'd prefer we don't put too much time debating this. Jose _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev