Hi Tom,

We are profiling our codes on Xeon w3530(8 cores)+12GB memory+N210, and
find some interesting issues.

1. The receiver works well at 1MHz sample rate, we see each core is 10%~20%
occupied using system monitor. Once we set sample rate larger than 1M (say
2M), the program blocks(no decoding output) and we see only one core is
100% occupied while others are idle. Using Kcachegrind, we see 86% cpu time
is cost by function "raw_peak_detector_fb::work(...)". This function is
used by the first module (synchronization) of RawOFDM, I think this is the
module that choke the system. My first step is to dig into this module and
try to make it faster.

2. In the ordinary case (1MHz) both the transmitter and receiver call the
function "gr_multiply_cc::work()" frequently, and its cost is quite high
(nearly 18% of the program). I think there are methods to boost this
function, right? Perhaps the VOLK lib will help, I will try it out.

Sincerely,
--
Yang, Qing
Information Engineering, CUHK



2012/8/28 Tom Rondeau <[email protected]>

> On Mon, Aug 27, 2012 at 7:07 AM, Qing Yang <[email protected]> wrote:
> > Hi there,
> >
> > I am currently doing a OFDM transceiver project based on RawOFDM. We
> want to
> > implement 20MHz bandwidth transmit/receive, but the RawOFDM code seems to
> > support only narrow band (<1MHz). Once I set the sample-rate larger than
> > 1MHz, my program will block with overrun messages (more details here
> > http://lists.gnu.org/archive/html/discuss-gnuradio/2012-08/msg00069.html).
> I
> > think the reason is that at 20MHz sample-rate, USRP produces too much
> data
> > for the PC to process and drain PC's computation power.
> >
> > To boost the speed, I have two questions
> >
> > 1) My cpu have 8 threads(4 cores), can I manually dedicate one thread to
> > each gr block, and make it a pipe-line system? Tom mentioned that
> gnuradio
> > use a "thread-per-block" scheduler
> > (
> http://lists.gnu.org/archive/html/discuss-gnuradio/2010-09/msg00274.html)
> > but in my case only two threads are 100% occupied when I run the program.
> >
> > 2) Inside some blocks, we extensively use vector multiplications (e.g.,
> > precoding, CFO compensation). I've heard about the use of SSE to boost
> the
> > speed of vector multiplication. How can I utilize this technology in my
> > program?
> >
> >
> > Best regards,
> > --
> > Yang, Qing
> > Information Engineering, CUHK
>
>
> Qing,
>
> Yes, the default scheduler is the thread-per-block, so each block
> operates in its own thread, and the OS will distribute those across
> the CPU's. What you are seeing is probably that two blocks in
> particular are taking a long time to process and starving the others.
> So CPU affinity won't help you. From your other posts, it looks like
> you are trying to profile the code. That's the better way to go;
> figure out which blocks are taking the most time and try to optimize
> them.
>
> Tom
>
_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to