Re: Help with Hand-Optimized Assembly

James Van Buskirk Wed, 28 Mar 2012 11:30:41 -0700

"Terje Mathisen" <"terje.mathisen at tmsw.no"@giganews.com> wrote in message 
news:5gh6u8-3062....@ntp6.tmsw.no...


> sfuerst wrote:
>> There is a straight-forward algorithm using the fact that only one of
>> the bounds can be crossed...

>> Something like this:
>> (Inputs in %xmm0, and %xmm1, output in %xmm0)

>> subsd %xmm1,%xmm0
>> movsd plusM_PI(%rip), %xmm1
>> movsd minusM_PI(%rip), %xmm2

>> cmpgtsd %xmm0, %xmm1
>> cmpltsd %xmm0, %xmm2

>> andpd  minus2M_PI(%rip), %xmm1
>> andpd  plus2M_PI(%rip), %xmm2

>> addsd %xmm1, %xmm0
>> addsd %xmm2, %xmm0

>> I probably have some of the comparisons reversed by mistake... but you
>> get the idea.  You can do both comparisons in parallel.  Using sign
>> tricks doesn't seem to be profitable, as that increases the length of
>> the critical path.

> Very nice, and definitely much better than my approach!
> :-)

I really liked your approach more because it doesn't involve as many
loads nor as many long-latency operations like ADDSD and CMPccSD.
Looking at the above code we see four such long-latency instructions
in the path and I think we can do better with:

   subsd xmm0, xmm1 ; {clock 1}
   movsd xmm2, [signbits] ; -0.0 {asynchronous}
   movaps xmm3, xmm2
   andps xmm2, xmm0 ; sign(0.0,delta) {clock 4}
   andnps xmm3, xmm0 ; abs(delta) {clock 4}
   xorps xmm2, [minustwopi] ; -sign(2*pi,delta) {clock 5}
   cmplesd xmm3, [pi] ; -1 or 0 {clock 5}
   addsd xmm2, xmm0 ; delta-sign(2*pi,delta) {clock 6}
   andps xmm0, xmm3 ; delta or 0 {clock 8}
   andnps xmm3, xmm2 ; 0 or delta-sign(2*pi,delta) {clock 9}
   orps xmm0, xmm3 ; delta or delta-sign(2*pi,delta) {clock 10}

-- 
write(*,*) transfer((/17.392111325966148d0,6.5794487871554595D-85, &
6.0134700243160014d-154/),(/'x'/)); end


_______________________________________________
help-gplusplus mailing list
help-gplusplus@gnu.org
https://lists.gnu.org/mailman/listinfo/help-gplusplus

Re: Help with Hand-Optimized Assembly

Reply via email to