On 01/17/2012 07:36 PM, Josh Blum wrote:
On 01/16/2012 09:51 AM, ziyang wrote:
On 01/13/2012 09:30 PM, Josh Blum wrote:
To reduce the computation load of the processor, I tried two methods:
1) modify the gr.quadrature_demod_cf block, replace some multiplication
operations with volk-based operations (gr.multiply and gr.multiply_const
modules in gr_blocks);
I like it. Make sure to contribute patches like that back. :-)
Actually, what I did was writing a new quadrature_demod block without
the multiplication and delay operations, and connect extra gr.multiply
and gr.delay blocks instead in the flow graph. Because my understanding
is that the volk functions take a vector (multiple values) as input, and
I didn't figure out a way to do the single-item-operation in the volk
style.
I dont recommend using the extra blocks, that would probably cause more
overhead. Looking at gr_quadrature_demod_cf::work, it looks like you can
vectorize the operation of the conjugate multiply, then the atan, then
the gain scaler. So, that would be one for loop that operates on 4
samples at a time, and calls 3 volk functions.
Josh, thank you for your advice! Before I tried using gr.multiply out of
the block, I actually implemented a demodulation block in a way that's
similar to your suggestion, but the loop operated on 100 samples at a
time. I don't know if it was the 100-samples-vectorization that caused a
bad performance. I will try processing 4 samples at a time.
Also, you may consider timing a particular operation as a performance
metric, rather than counting the number of demodulated packets.
I was wondering if there are examples from which I can learn how to do
this?
Sorry, I guess there isnt much in the way of examples.
You can time individual work functions by adding some code before an
after. We have some high resolution timers in
gruel/include/gruel/high_res_timers.h
So I call the timer functions of high_res_timers.h before and after the
operation in the work function, is that right?
I have also seen people time the block in a simple flow graph with a
null source, head, your_block, null_sink. You can time tb.run() and
compare run duration vs the non-vectorized code.
-Josh
I got two questions about this:
1) Is the "head" block for generating data for the processing block?
2) The initialization of uhd is done first after tb.run(), so how could
I isolate the processing time from the time between tb.run() - tb.stop() ?
Thanks.
Best Regards,
Terry
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio