On Thu, Jan 19, 2012 at 10:04 AM, ziyang <[email protected]> wrote:

>
>  I dont recommend using the extra blocks, that would probably cause more
>> overhead. Looking at gr_quadrature_demod_cf::work, it looks like you can
>> vectorize the operation of the conjugate multiply, then the atan, then
>> the gain scaler. So, that would be one for loop that operates on 4
>> samples at a time, and calls 3 volk functions.
>>
>>
> Hi, Josh. I implemented a quadrature_demod_cf block (please find it in the
> attachment). Since the Volk atan2 function is currently only for SSE as
> Nick said, and there is no conjugate-multiply function for FC32 inputs, I
> use Gnuradio's built-in conjugate and fast_atan_2f functions, plus two volk
> multiply functions. The for loop is timed by high_res_timer. Besides, the
> work function of gr_quadrature_demod_cf is timed for comparison purpose
> (also attached). Each of these two blocks is connected to a file_source
> which provides modulated data.
>
> I tested two blocks individually, firstly on a PC with Intel processor,
> then on E100. On PC, it always take volk-based block less time to
> demodulate a same-size-buffer of data (i.e. for 4096 input items, it takes
> the original quadrature_demod_cf block 0.185 ms but takes volk-based block
> only 0.163 ms to demodulate).
>
> However, the results are different on E100: sometimes the original block
> runs faster, sometimes the volk-based block does. I ran the tests for
> several times, although the recorded time changes by some tens
> (occasionally a few handreds) of nanoseconds, but neither block is always
> faster than the other.
>
> Now I'm confused by the results, since I expected the volk-ified
> demodulator to be faster. Could you give me some help on this issue? Thanks.
>
>
Optimizing an algorithm is a hard and sometimes counterintuitive process.
You might benchmark the following:

- Gnuradio's atan2 WITHOUT any Volk multiplications (just comment out the
volk mults in your block)
- The Volk multiplications WITHOUT Gnuradio's atan2 (just comment out the
atan2 in your block)

This will let you determine where the bottleneck is. In addition, try
running over a MUCH larger dataset. The clock resolution at <1ms is not
very good and the scheduler will have a correspondingly larger effect at
smaller timescales.

I think you'll find the atan2 part takes vastly longer than the
multiplications do, and that will be where you have to look for performance
improvements.

--n


>
> Best Regards,
>
> Terry
>
> _______________________________________________
> Discuss-gnuradio mailing list
> [email protected]
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
>
>
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to