Re: [Discuss-gnuradio] Try to improve E100's performance at high sample rate

ziyang Tue, 17 Jan 2012 11:47:19 -0800

On 01/17/2012 07:54 PM, Nick Foster wrote:

On Tue, Jan 17, 2012 at 10:36 AM, Josh Blum <[email protected]<mailto:[email protected]>> wrote:
    On 01/16/2012 09:51 AM, ziyang wrote:
    > On 01/13/2012 09:30 PM, Josh Blum wrote:
    >>> To reduce the computation load of the processor, I tried two
    methods:
    >>> 1) modify the gr.quadrature_demod_cf block, replace some
    multiplication
    >>> operations with volk-based operations (gr.multiply and
    gr.multiply_const
    >>> modules in gr_blocks);
    >> I like it. Make sure to contribute patches like that back. :-)
    > Actually, what I did was writing a new quadrature_demod block
    without
    > the multiplication and delay operations, and connect extra
    gr.multiply
    > and gr.delay blocks instead in the flow graph. Because my
    understanding
    > is that the volk functions take a vector (multiple values) as
    input, and
    > I didn't figure out a way to do the single-item-operation in the
    volk
    > style.
    >

    I dont recommend using the extra blocks, that would probably cause
    more
    overhead. Looking at gr_quadrature_demod_cf::work, it looks like
    you can
    vectorize the operation of the conjugate multiply, then the atan, then
    the gain scaler. So, that would be one for loop that operates on 4
    samples at a time, and calls 3 volk functions.
Right now, the Volk atan2 function is only implemented for SSE andonly works if libsimdmath is installed. If not, it will fall back to ageneric implementation which is considerably slower than Gnuradio'sLUT atan2. There's no NEON implementation, so right now the fastestoption on E100 is to use Gnuradio's built-in atan2.
I spent some quality time a couple of months ago during SDR Forumwriting a vectorized atan2 algorithm in Volk via Orc. I was unable toget the entire algorithm to fit within the register constraints theOrc runtime compiler applies. The end goal is to get the entirealgorithm vectorized so it only needs to write out to memory once,which is going to be far faster than running three vector operationsacross a large buffer which won't fit into cache. I'll get back to itone of these days but it looks like parts of Orc's compiler will haveto be improved. Terry, if you're interested, Orc code is easily readand looks like vector pseudocode, so my Orc implementation might be ofuse if you're interested in writing a custom NEON implementation forVolk. It's based on the libsimdmath implementation, which is in turnbased on Cephes, and uses all sorts of Crazy Math Tricks.
--n

Thank you for your help, Nicks. Right now, I really want to have afaster atan implementation, but I use python and occationally c++ formost of the time, so I'm not sure if I can handle the custom NEONimplementation because these Orc / NEON / libsmdmath / Cephes are allcompletely new to me.


Thanks.


Best Regards,

Terry



    >> Also, you may consider timing a particular operation as a
    performance
    >> metric, rather than counting the number of demodulated packets.
    >>
    > I was wondering if there are examples from which I can learn how
    to do
    > this?

    Sorry, I guess there isnt much in the way of examples.

    You can time individual work functions by adding some code before an
    after. We have some high resolution timers in
    gruel/include/gruel/high_res_timers.h

    I have also seen people time the block in a simple flow graph with a
    null source, head, your_block, null_sink. You can time tb.run() and
    compare run duration vs the non-vectorized code.

    -Josh

    _______________________________________________
    Discuss-gnuradio mailing list
    [email protected] <mailto:[email protected]>
    https://lists.gnu.org/mailman/listinfo/discuss-gnuradio



_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Re: [Discuss-gnuradio] Try to improve E100's performance at high sample rate

Reply via email to