Hi Nathan,
> I think it's more of a testament to how good Intel (and modern)
> processors are at branching.
wow!
> An adventurous soul is welcome to try putting that table lookup in to
> an avx2 protokernel with the new gather instructions.
I am adventurous, but that sounds like a hard beer, since atan2 requires
all the angle-wrapping, and special cases like atan( ⋅ , 0) = ¹/₂π

Do you think it would make sense to replace the linear interpolation,
using the slope between two LUT entries, in fast_atan2 by using the
actual derivate of atan at the position? (Seeing as $\alpha
\frac{\mathrm d}{\mathrm dx} \arctan x= \frac\alpha{1+x^2}$  is pretty
easy to calculate).

> If you write a generic table look up I'll reward with at least two
> beers :-)
These actually seem easier, if you leave out the interpolation – I'm not
familiar with cache sizes, but wouldn't having a second table containing
the derivates at the centers between the LUT entries be the "generally
fast" solution?


Best regards,
Marcus

On 11/10/2015 11:55 PM, West, Nathan wrote:
>
>
> An adventurous soul is welcome to try putting that table lookup in to
> an avx2 protokernel with the new gather instructions. I'll reward with
> beer next time I see the author. If you write a generic table look up
> I'll reward with at least two beers :-)
>
> Nathan
>
> On Tuesday, November 10, 2015, Marcus Müller <[email protected]
> <mailto:[email protected]>> wrote:
>
>     Hi Johannes, Hi xd,
>
>     complex_to_arg uses GNU Radio's fast_atan2f function, which is an
>     approximation [1].
>     Between the 255 values of the lookup table, it uses linear
>     interpolation, hence your 0.4 error factor.
>
>     As Johannes said, that's not really surprising for a look up
>     table-based approach.
>     I do think using this approximation is justified, but I also think
>     that the codebase it uses has been obsolete for a bit now:
>     gr::fast_atan2 could be replaced by volk's
>     volk_32fc_s32f_atan2_32f, which has been around since 2012, but
>     hasn't seen any use in GNU Radio, as far as I can tell.
>
>     Now, I went ahead and had a benchmark [2] which showed that
>     gr::fast_atan2 is actually quite fast -- but that's only twice as
>     fast as the standard been-around-forever libc implementation and
>     the volk implementation (which, admittedly, also does a
>     multiplication with 1.0, and by the way: the generic volk kernel
>     (which does libc atan2 + multiplication) is exactly as fast as the
>     SSE4 one on my machine), and everything is pretty much in the same
>     range as C++ <complex>'s std::arg :
>
>     For 2²⁵ complex numbers, of which at least half have small angles:
>
>     1: .fast:
>     1:  0.397261s wall, 0.370000s user + 0.020000s system = 0.390000s
>     CPU (98.2%)
>     1:
>     1: .volk:  0.780515s wall, 0.760000s user + 0.020000s system =
>     0.780000s CPU (99.9%)
>     1:
>     1: .libc:  0.777738s wall, 0.760000s user + 0.020000s system =
>     0.780000s CPU (100.3%)
>     1:
>     1: .c++ complex arg:  0.815700s wall, 0.780000s user + 0.030000s
>     system = 0.810000s CPU (99.3%)
>
>     But: this is on an Intel i7. Things might look different on your
>     average android phone or even worse, your raspberry Pi (so if you
>     wanna test, [2] ).
>
>     Conclusion: If you're after small angles, the current
>     complex_to_arg's factor 2 speedup might not be what your after.
>     That is probably not the case if you use complex_to_arg in an
>     quadrature_demod inside an FM audio receiver running on an
>     embedded device -- small angle errors don't make the least
>     difference here.
>     The question is, like it was with gr::random, whether we still
>     prefer performance over preciseness, or if we excercise exactness.
>
>     Also, I was pretty amazed how fast fast_atan2 really is –  its
>     dependence on branching suggests it's pretty hard to vectorize and
>     optimize as a compiler.
>
>     Best regards,
>     Marcus
>
>     [1]
>     
> https://gnuradio.org/doc/doxygen/group__misc.html#ga6c1470346a3524989b7a8a3639aa79a7
>     [2]
>     On 10.11.2015 10:45, Johannes Demel wrote:
>>     Hi,
>>
>>     Could you extend a test case for this block with Python? This might
>>     reveal issues with the implementation more easily. Also, others might
>>     benefit from it.
>>     For your specific problem, I guess the GR block result is as close as
>>     it gets for a LUT-based calculation. And it's not off by a lot but by
>>     some 10^-x.
>>
>>     Cheers
>>     Johannes
>>
>>     On 10.11.2015 10:29, w xd wrote:
>>     > Hi all,
>>
>>     > Thank you very much in advance.
>>
>>     > I find the result of the block "complex to Arg" is same to the
>>     > result in matlab most of the time,while sometimes the results is
>>     > different from the result in matlab.
>>
>>     > For example, a=1.646236600879293e+03 + 8.043715071772031e+00i I use
>>     > the command  atan2 or angle to calculate the result. It return
>>     > 0.004886084452240.
>>
>>     > While i calculate the result using the gnuradio. It return
>>     > 0.002944485750049.
>>
>>     > Can someone explain it?
>>
>>     > The version of gnuradio:3.7.5. Best regards, xd
>>
>>
>>
>>     > _______________________________________________ Discuss-gnuradio
>>     > mailing list [email protected]
>>     <javascript:_e(%7B%7D,'cvml','[email protected]');>
>>     > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>
>     > > _______________________________________________ >
>     Discuss-gnuradio mailing list > [email protected]
>     <javascript:_e(%7B%7D,'cvml','[email protected]');> >
>     https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>

_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to