Thanks. I didn't think of trying -O2. That and -O1 give me a sqrtsd instruction. With both -O2 and -march=native, I get a vsqrtsd. And all 3 options give me a vrsqrtss. What's hilarious is the -O[12] make Q_rsqrt() faster than 1/sqrtf(), in spite of the assembler instruction(s).
gepr@cormac:~/lang/c$ ./O1.out 1/sqrt() took 0.095175 s Q_rsqrt() took 0.065637 s gepr@cormac:~/lang/c$ ./O2.out 1/sqrt() took 0.052231 s Q_rsqrt() took 0.029407 s On 1/8/21 3:28 PM, Marcus Daniels wrote: > I mean I think it is that you may be targeting too low of a common > denominator in terms of the processor. That should work for doubles too. > > -----Original Message----- > From: Friam <[email protected]> On Behalf Of Marcus Daniels > Sent: Friday, January 8, 2021 3:23 PM > To: The Friday Morning Applied Complexity Coffee Group <[email protected]> > Subject: Re: [FRIAM] Q_rsqrt() vs 1/sqrt() > > mdaniels@daniels:~$ cat t.c > #include <math.h> > #include <stdlib.h> > #include <stdio.h> > > int main(int argc,const char **argv) { > float val = atof (argv[1]); > float ret = (1.0f/sqrtf(val)); > printf("%f\n",(double) ret); > } > mdaniels@daniels:~$ gcc -march=native -O2 -ffast-math -S t.c > mdaniels@daniels:~$ grep sqrt t.s > vrsqrtss %xmm0, %xmm0, %xmm1 -- ↙↙↙ uǝlƃ - .... . -..-. . -. -.. -..-. .. ... -..-. .... . .-. . FRIAM Applied Complexity Group listserv Zoom Fridays 9:30a-12p Mtn GMT-6 bit.ly/virtualfriam un/subscribe http://redfish.com/mailman/listinfo/friam_redfish.com archives: http://friam.471366.n2.nabble.com/ FRIAM-COMIC http://friam-comic.blogspot.com/
