Roland Scheidegger wrote:
> Roland Scheidegger wrote:
>> Rune Petersen wrote:
>>> This patch: - Fixes COS. - Does range reductions for SIN & COS. - 
>>> Adds SCS. - removes the optimized version of SIN & COS. - tweaked 
>>> weight (should help on precision). - fixed a copy paste typo in 
>>> emit_arith().
>>>
>>> Roland would you mind testing if the tweaked weight helped?
>> Well I didn't test it first time (just quoting the numbers from the
>> link you provided), but I guess that's fine too. I was actually
>> wondering myself if it's better to optimize for absolute or relative
>> error, so choosing a weight in-between should work too (the
>> difference is not that big after all).
>>
>> A couple comments though: Since ((x + PI/2)/(2*PI))+0.5 is (x/(2*PI)
>> + (1/4 + 0.5) you could optimize away the first mad for the COS case.
>>
> Ah I see you're a bit short on consts, if you want to only use 2 (btw
> I'd say there should be 32 not only 16 but I have no idea why the driver
> restricts it to 16).
> 
>> Also, the comments for SCS seem a bit off. That's a pity, because 
>> without comments I can't really see what the code does at first sight
>>  :-). Looks like quite a few extra instructions though, are you sure
>> not more could be shared for calculating both sin and cos?
> I've looked a bit closer (this is an interesting optimization
> problem...) and I think it should be doable with fewer instructions,
> though ultimately I needed 2 temps instead of 1 (I don't think it's much
> of a problem, 32 is plenty, PS2.0 only exposes 12).
> 
> Ok the equation was:
> Q (4/pi x - 4/pi^2 x^2) + P (4/pi x - 4/pi^2 x^2)^2
> 
> Simplified to:
> y = B * x + C * x * abs(x)
> y = P * (y * abs(y) - y) + y
> 
> const0: B,C,pi,P
> const1: 0.5pi, 0.75, 1/(2pi), 2.0pi
> 
> That's what I came up with with pseudo-code:
> //should be 5 slots (I guess it might generate 6 due to force same-slot,
> //but that needs fixing elewhere)
> 
> //cos is even: cos(x) = cos(-x). So using simple trigo-fu
> //we get sin(neg(abs(x)) + pi/2)) = cos(x), no comparison needed and all
> //values for sine stay inside [-pi,pi] ([-pi/2, pi/2], actually)
> //hope it's ok to use neg+abs simultaneously?
> temp.z = add(neg(abs(src)), const1.x)
> temp.w = mul(src, C)
> 
> //temp.xy = B*x, C*x (cos), temp.w = C * x, temp2.w = B * x (sin)
> temp.xy = mul(temp.z, BC)
> temp2.w = mul(src, B)
> 
> //do cos in alpha slot not sin due to restricted swizzling
> //sin y = B * x + C * x * abs(x)
> temp2.z = mad(temp.w, abs(src), temp2.w)
> //cos
> temp2.w = mad(temp.y, abs(temp.z), temp.x)
> 
> temp.xy = mad(temp2.wzy, abs(temp2.wzy), neg(temp2.wzy))
> // now temp.x holds y * abs(y) - y for cos, temp.y same for sin
> 
> dest.xy = mad(temp.xy, P, temp2.wzy)
> 
> range reduction for cos:
> x = (x/(2*PI))+0.75
> x = frac(x)
> x = (x*2*PI)-PI
> 
> sin:
> x = (x/(2*PI))+HALF
> x = frac(x)
> x = (x*2*PI)-PI
> 
> Isn't that an elegant solution :-) There may be any number of bugs, of
> course...

Very elegant I must say. Thank you I'll see about implementing this.


Rune Petersen

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to