Hi Nathan,
> I think it's more of a testament to how good Intel (and modern)
> processors are at branching.
wow!
> An adventurous soul is welcome to try putting that table lookup in to
> an avx2 protokernel with the new gather instructions.
I am adventurous, but that sounds like a hard beer, since atan2 requires
all the angle-wrapping, and special cases like atan( ⋅ , 0) = ¹/₂π
Do you think it would make sense to replace the linear interpolation,
using the slope between two LUT entries, in fast_atan2 by using the
actual derivate of atan at the position? (Seeing as $\alpha
\frac{\mathrm d}{\mathrm dx} \arctan x= \frac\alpha{1+x^2}$ is pretty
easy to calculate).
> If you write a generic table look up I'll reward with at least two
> beers :-)
These actually seem easier, if you leave out the interpolation – I'm not
familiar with cache sizes, but wouldn't having a second table containing
the derivates at the centers between the LUT entries be the "generally
fast" solution?
Best regards,
Marcus
On 11/10/2015 11:55 PM, West, Nathan wrote:
>
>
> An adventurous soul is welcome to try putting that table lookup in to
> an avx2 protokernel with the new gather instructions. I'll reward with
> beer next time I see the author. If you write a generic table look up
> I'll reward with at least two beers :-)
>
> Nathan
>
> On Tuesday, November 10, 2015, Marcus Müller <[email protected]
> <mailto:[email protected]>> wrote:
>
> Hi Johannes, Hi xd,
>
> complex_to_arg uses GNU Radio's fast_atan2f function, which is an
> approximation [1].
> Between the 255 values of the lookup table, it uses linear
> interpolation, hence your 0.4 error factor.
>
> As Johannes said, that's not really surprising for a look up
> table-based approach.
> I do think using this approximation is justified, but I also think
> that the codebase it uses has been obsolete for a bit now:
> gr::fast_atan2 could be replaced by volk's
> volk_32fc_s32f_atan2_32f, which has been around since 2012, but
> hasn't seen any use in GNU Radio, as far as I can tell.
>
> Now, I went ahead and had a benchmark [2] which showed that
> gr::fast_atan2 is actually quite fast -- but that's only twice as
> fast as the standard been-around-forever libc implementation and
> the volk implementation (which, admittedly, also does a
> multiplication with 1.0, and by the way: the generic volk kernel
> (which does libc atan2 + multiplication) is exactly as fast as the
> SSE4 one on my machine), and everything is pretty much in the same
> range as C++ <complex>'s std::arg :
>
> For 2²⁵ complex numbers, of which at least half have small angles:
>
> 1: .fast:
> 1: 0.397261s wall, 0.370000s user + 0.020000s system = 0.390000s
> CPU (98.2%)
> 1:
> 1: .volk: 0.780515s wall, 0.760000s user + 0.020000s system =
> 0.780000s CPU (99.9%)
> 1:
> 1: .libc: 0.777738s wall, 0.760000s user + 0.020000s system =
> 0.780000s CPU (100.3%)
> 1:
> 1: .c++ complex arg: 0.815700s wall, 0.780000s user + 0.030000s
> system = 0.810000s CPU (99.3%)
>
> But: this is on an Intel i7. Things might look different on your
> average android phone or even worse, your raspberry Pi (so if you
> wanna test, [2] ).
>
> Conclusion: If you're after small angles, the current
> complex_to_arg's factor 2 speedup might not be what your after.
> That is probably not the case if you use complex_to_arg in an
> quadrature_demod inside an FM audio receiver running on an
> embedded device -- small angle errors don't make the least
> difference here.
> The question is, like it was with gr::random, whether we still
> prefer performance over preciseness, or if we excercise exactness.
>
> Also, I was pretty amazed how fast fast_atan2 really is – its
> dependence on branching suggests it's pretty hard to vectorize and
> optimize as a compiler.
>
> Best regards,
> Marcus
>
> [1]
>
> https://gnuradio.org/doc/doxygen/group__misc.html#ga6c1470346a3524989b7a8a3639aa79a7
> [2]
> On 10.11.2015 10:45, Johannes Demel wrote:
>> Hi,
>>
>> Could you extend a test case for this block with Python? This might
>> reveal issues with the implementation more easily. Also, others might
>> benefit from it.
>> For your specific problem, I guess the GR block result is as close as
>> it gets for a LUT-based calculation. And it's not off by a lot but by
>> some 10^-x.
>>
>> Cheers
>> Johannes
>>
>> On 10.11.2015 10:29, w xd wrote:
>> > Hi all,
>>
>> > Thank you very much in advance.
>>
>> > I find the result of the block "complex to Arg" is same to the
>> > result in matlab most of the time,while sometimes the results is
>> > different from the result in matlab.
>>
>> > For example, a=1.646236600879293e+03 + 8.043715071772031e+00i I use
>> > the command atan2 or angle to calculate the result. It return
>> > 0.004886084452240.
>>
>> > While i calculate the result using the gnuradio. It return
>> > 0.002944485750049.
>>
>> > Can someone explain it?
>>
>> > The version of gnuradio:3.7.5. Best regards, xd
>>
>>
>>
>> > _______________________________________________ Discuss-gnuradio
>> > mailing list [email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>
>> > https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>>
> > > _______________________________________________ >
> Discuss-gnuradio mailing list > [email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');> >
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio