----- Original Message ----- > On Mon, Dec 12, 2011 at 8:20 AM, Jose Fonseca <jfons...@vmware.com> > wrote: > > ----- Original Message ----- > >> On Mon, Dec 12, 2011 at 2:10 PM, Jose Fonseca > >> <jfons...@vmware.com> > >> wrote: > >> > > >> > > >> > ----- Original Message ----- > >> >> On Mon, Dec 12, 2011 at 1:24 PM, Jose Fonseca > >> >> <jfons...@vmware.com> > >> >> wrote: > >> >> > I saw this email yesterday, but did not have time to reply. > >> >> > > >> >> > ----- Original Message ----- > >> >> >> On Mon, Nov 21, 2011 at 6:52 PM, Eric Anholt > >> >> >> <e...@anholt.net> > >> >> >> wrote: > >> >> >> > On Sun, 20 Nov 2011 15:08:55 +0100, Marek Olšák > >> >> >> > <mar...@gmail.com> > >> >> >> > wrote: > >> >> >> >> unpack_float_z_Z24_X8 fails with -O2 in: > >> >> >> >> - fbo-blit-d24s8 > >> >> >> >> - fbo-depth-sample-compare > >> >> >> >> - fbo-readpixels-depth-formats > >> >> >> >> - glean/depthStencil > >> >> >> >> > >> >> >> >> With -O0, it works fine. > >> >> >> >> > >> >> >> >> I am removing all the assertions. There's not much point > >> >> >> >> in > >> >> >> >> having > >> >> >> >> them, > >> >> >> >> is there? > >> >> >> > > >> >> >> > Is -ffast-math involved at all? > >> >> >> > >> >> >> Yes, -fno-fast-math makes the problem go away. > >> >> >> > >> >> >> > > >> >> >> > At a guess, replacing "* scale" with "/ (float)0xffffff" > >> >> >> > makes > >> >> >> > the > >> >> >> > failure happen either way? > >> >> >> > >> >> >> Yes. Only using GLdouble fixes it. > >> >> >> > >> >> >> It makes sense. The mantissa without the sign has 23 bits, > >> >> >> but > >> >> >> 24 > >> >> >> bits > >> >> >> are required to exactly represent 0xffffff if I am not > >> >> >> mistaken. > >> >> > > >> >> > I'm afraid this is wrong: single precision floating point can > >> >> > describe 24bits uints (and therefore unorms) without any > >> >> > loss, > >> >> > because although the mantissa has 23bits, the mantissa is > >> >> > only > >> >> > used to represent all but the most significant bit, ie., > >> >> > there's > >> >> > an implicit 24th bit always set to one. > >> >> > > >> >> > The fact that -fno-fast-math makes the problem go away is yet > >> >> > another clear proof that the issue is _not_ lack of precision > >> >> > of > >> >> > single-precision floating point -- as -fno-fast-math will > >> >> > still > >> >> > use single-precision floating point numbers. Actually, > >> >> > fno-fast-math, it will ensure that all intermediate steps are > >> >> > done > >> >> > in single precision instead of higher intel x87 80bit > >> >> > floating > >> >> > points, on the 32bit x86 architecture. And I suspect the > >> >> > problem > >> >> > here is that the rounding w/ x80 yields wrong results. > >> >> > > >> >> > > >> >> > I also disagree that using double precision is a good > >> >> > solution. > >> >> > Either fast-math serves our purposes, or it doesn't. Using > >> >> > double > >> >> > precision to work-around issues with single-precision > >> >> > fast-math > >> >> > is > >> >> > the *worst* thing we could do. > >> >> > > >> >> > > >> >> > Does the assertion failure even happen on 64bits? I doubt, as > >> >> > x64 > >> >> > ABI establishes the use of sse for all floating point, and > >> >> > x87 > >> >> > 80bit floating point will never be used. So effectively this > >> >> > is > >> >> > making all archs slower. > >> >> > > >> >> > > >> >> > Honestly, I think the patch should be recalled. And we > >> >> > should > >> >> > consider disabling fast-math. And maybe enabling > >> >> > -mfpmath=sse > >> >> > on > >> >> > 32bits x86 (i.e., require sse). > >> >> > >> >> You're probably right. Feel free to revert & fix the issue in > >> >> some > >> >> other way. > >> > > >> > OK. > >> > > >> > Could you please confirm whether the assertions failures you saw > >> > were on 32bits or 64bits mode? > >> > >> 32-bit. > > > > Thanks. > > > > In that case I propose dropping fast-math, at least on x86 32bits, > > as it makes (in particular the unorm24 <-> f32 conversions happen > > in a lot of places in Mesa's source tree) and some times worse > > code [1], and simply use the subset of finer grained options [2]: > > > > -fno-math-errno, -funsafe-math-optimizations, -ffinite-math-only, > > -fno-rounding-math, -fno-signaling-nans and -fcx-limited-range. > > > > which really meets our needs. > > > > About -mfpmath=sse on 32bits, although I believe that depending on > > sse would be alright, it looks like -mfpmath=sse it's not a clear > > winner on 32bits , because calls to libc still need to use x87 > > registers. > > > > > > Jose > > > > > > [1] http://programerror.com/2009/09/when-gccs-ffast-math-isnt/ > > [2] > > http://gcc.gnu.org/onlinedocs/gcc-4.6.2/gcc/Optimize-Options.html > > > Thanks for digging into this, guys. > > I'm happy to drop -ffast-math (or use -fno-fast-math) but it would be > interesting to do at least a few before/after comparisons just to > make > sure there's no surprises in performance.
Yep, that was my plan. > In any case, don't we still need to use double-precision for Z32 > packing/unpacking? I ran into a failure in _mesa_pack_float_z_row() > for Z32 on Saturday (debug build on x64). Agreed. I don't know an accurate & efficient way converting unorm32 <=> f32 without doubles. The trick I use in llvmpipe to convert f32 to unorm32, in src/gallium/auxiliary/gallivm/lp_bld_conv.c , only gets 31bits right (i.e, drops one bit from values near 0.0): /* * The destination exceeds what can be represented in the floating point. * So multiply by the largest power two we get away with, and when * subtract the most significant bit to rescale to normalized values. * * The largest power of two factor we can get away is * (1 << (src_type.width - 1)), because we need to use signed . In theory it * should be (1 << (src_type.width - 2)), but IEEE 754 rules states * INT_MIN should be returned in FPToSI, which is the correct result for * values near 1.0! * * This means we get (src_type.width - 1) correct bits for values near 0.0, * and (mantissa + 1) correct bits for values near 1.0. Equally or more * important, we also get exact results for 0.0 and 1.0. */ > I'd like to leave Marek's patch committed as-is for now until we > settle on another solution. Yep. Jose _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev