On 01/16/2012 09:51 AM, ziyang wrote: > On 01/13/2012 09:30 PM, Josh Blum wrote: >>> To reduce the computation load of the processor, I tried two methods: >>> 1) modify the gr.quadrature_demod_cf block, replace some multiplication >>> operations with volk-based operations (gr.multiply and gr.multiply_const >>> modules in gr_blocks); >> I like it. Make sure to contribute patches like that back. :-) > Actually, what I did was writing a new quadrature_demod block without > the multiplication and delay operations, and connect extra gr.multiply > and gr.delay blocks instead in the flow graph. Because my understanding > is that the volk functions take a vector (multiple values) as input, and > I didn't figure out a way to do the single-item-operation in the volk > style. >
I dont recommend using the extra blocks, that would probably cause more overhead. Looking at gr_quadrature_demod_cf::work, it looks like you can vectorize the operation of the conjugate multiply, then the atan, then the gain scaler. So, that would be one for loop that operates on 4 samples at a time, and calls 3 volk functions. >> Also, you may consider timing a particular operation as a performance >> metric, rather than counting the number of demodulated packets. >> > I was wondering if there are examples from which I can learn how to do > this? Sorry, I guess there isnt much in the way of examples. You can time individual work functions by adding some code before an after. We have some high resolution timers in gruel/include/gruel/high_res_timers.h I have also seen people time the block in a simple flow graph with a null source, head, your_block, null_sink. You can time tb.run() and compare run duration vs the non-vectorized code. -Josh _______________________________________________ Discuss-gnuradio mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
