Roland Scheidegger wrote: > Roland Scheidegger wrote: >> Rune Petersen wrote: >>> This patch: - Fixes COS. - Does range reductions for SIN & COS. - >>> Adds SCS. - removes the optimized version of SIN & COS. - tweaked >>> weight (should help on precision). - fixed a copy paste typo in >>> emit_arith(). >>> >>> Roland would you mind testing if the tweaked weight helped? >> Well I didn't test it first time (just quoting the numbers from the >> link you provided), but I guess that's fine too. I was actually >> wondering myself if it's better to optimize for absolute or relative >> error, so choosing a weight in-between should work too (the >> difference is not that big after all). >> >> A couple comments though: Since ((x + PI/2)/(2*PI))+0.5 is (x/(2*PI) >> + (1/4 + 0.5) you could optimize away the first mad for the COS case. >> > Ah I see you're a bit short on consts, if you want to only use 2 (btw > I'd say there should be 32 not only 16 but I have no idea why the driver > restricts it to 16). > >> Also, the comments for SCS seem a bit off. That's a pity, because >> without comments I can't really see what the code does at first sight >> :-). Looks like quite a few extra instructions though, are you sure >> not more could be shared for calculating both sin and cos? > I've looked a bit closer (this is an interesting optimization > problem...) and I think it should be doable with fewer instructions, > though ultimately I needed 2 temps instead of 1 (I don't think it's much > of a problem, 32 is plenty, PS2.0 only exposes 12). > > Ok the equation was: > Q (4/pi x - 4/pi^2 x^2) + P (4/pi x - 4/pi^2 x^2)^2 > > Simplified to: > y = B * x + C * x * abs(x) > y = P * (y * abs(y) - y) + y > > const0: B,C,pi,P > const1: 0.5pi, 0.75, 1/(2pi), 2.0pi > > That's what I came up with with pseudo-code: > //should be 5 slots (I guess it might generate 6 due to force same-slot, > //but that needs fixing elewhere) > > //cos is even: cos(x) = cos(-x). So using simple trigo-fu > //we get sin(neg(abs(x)) + pi/2)) = cos(x), no comparison needed and all > //values for sine stay inside [-pi,pi] ([-pi/2, pi/2], actually) > //hope it's ok to use neg+abs simultaneously? > temp.z = add(neg(abs(src)), const1.x) > temp.w = mul(src, C) > > //temp.xy = B*x, C*x (cos), temp.w = C * x, temp2.w = B * x (sin) > temp.xy = mul(temp.z, BC) > temp2.w = mul(src, B) > > //do cos in alpha slot not sin due to restricted swizzling > //sin y = B * x + C * x * abs(x) > temp2.z = mad(temp.w, abs(src), temp2.w) > //cos > temp2.w = mad(temp.y, abs(temp.z), temp.x) > > temp.xy = mad(temp2.wzy, abs(temp2.wzy), neg(temp2.wzy)) > // now temp.x holds y * abs(y) - y for cos, temp.y same for sin > > dest.xy = mad(temp.xy, P, temp2.wzy) > > range reduction for cos: > x = (x/(2*PI))+0.75 > x = frac(x) > x = (x*2*PI)-PI > > sin: > x = (x/(2*PI))+HALF > x = frac(x) > x = (x*2*PI)-PI > > Isn't that an elegant solution :-) There may be any number of bugs, of > course...
Very elegant I must say. Thank you I'll see about implementing this. Rune Petersen ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel