Hello Gonzalo,

On 07.02.2016 20:58, Gonzalo Arcos wrote:
> Hello,
>
> I am trying to optimize the throughput of a flowgraph that was given
> to me, already designed and working. I have profiled every block, and
> improved on the performance of some blocks, which resulted in a better
> performance of the flowgraph as a whole.
>
> However, at the moment, i am trying to tackle on how the graph is
> executed by the gnuradio scheduler, to see if i can parallelize
> anything (i.e. pipeline) that is currently being executed sequentially
> with no good reason.
>
> To do this, i am trying to understand how does the gnuradio scheduler
> work, how blocks are executed, etc. I have not found as much
> information as i would like, leaving me with lots of questions. So
> from this point on i will state some of the information i gathered,
> and ask some questions. If anything i say is incorrect please tell me.
>
>  GNURadio defines one thread per block. This means that GNURadio
> automatically takes full advantage of multi core processors, without
> the programmer of the blocks having to do anything, given a high
> number of blocks. 
>
> - However, how does gnuradio scheduler decide which block to execute
> given a set of "ready to execute" blocks, without dependencies between
> them? 
Basically, each block_executor has its own loop; roughly it's about this:

* Wait for additional input to come in or output buffer to be consumed
and hence, ready for overwriting, or for new messages to come in
 * If messages came in, handle them!
* Ask the block (via forecast) whether it can run (general_)work()
* run that!
* Notify upstream blocks of how many items you've consumed, freeing
space in their output buffers,
* Notify downstream blocks of how many items you've produced, so they
can start to work on that.
* begin from the top.

>
> - If i have a flowgraph A -> B on a dual core processor, A and B being
> blocks. Will A and B execute concurrently after the first iteration of A?
Yes.
> By this i mean, on the first iteration of A, B has no data to work on,
> so one core will be idle, however after that, both cores should be
> working, since B can process data sent by A, and A can process new
> data independent of what B is doing.
Exactly! Indeed, GNU Radio asks A to produce up to [size of A's output
buffer in items]/2, so that as soon as its finished, B can start
working, but A can go back to work right away, maximizing parallelism.
> If A is faster than B at processing data, does A data gets queued on a
> buffer, and then is sent to B? Does B only triggers when the data
> requirements to perform a work (i.e. input items) are reached?.
If A is faster than B, A's input buffer will most of the time be empty,
while A's output == B's input buffer will be full; as long as A has no
space to write items to, it won't get asked to work().
>
> - Is there any way to see which gnuradio thread is executing in each
> core of the cpu, and which block corresponds to that thread? (This
> would be *REALLY USEFUL* for debugging purposes)
Yes! Current versions of GNU Radio (I think since 3.7.2 or so) have
proper thread names, so running `htop` or a similar Unix program will
show which thread is running, consuming CPU etc.
For more in-depth analysis, I'd recommend having a look at `perf record`
/ `perf report` [1], or even more advanced, GNU Radio's built-in
performance counters and performance monitor; activate them as explained
in [2], and add a "performance monitor" to your GRC flowgraph, if you
use GRC, or run `gr-perf-monitorx`.
>
> - In the gnuradio wiki it is explained how to set thread affinity and
> priority. However, it is not clear what they are useful for. Thread
> priority is pretty straightforward, so the only concept i dont fully
> get is the block thread affinity. In which scenario could it be useful
> to set that a thread has to run on a specific core?
The point is that you might, for example, be using a block that uses a
certain hardware accelerator, which is "close" to one CPU, but not to
another. For most PC-style workstations, this won't happen, and it's
best to let GNU Radio and your OS figure out on which CPU to schedule
threads on their own.

I've personally yet to discover a case where this is useful.

>
> Is there any point in trying to optimize a blocks performance by using
> OPEN MP or pthreads in cases of embarrasingly parallel operations? Or
> is it totally useless since all the cores are already at full load
> because there is one thread per block? I know that i could also try to
> use the GPU to speed up these kind of operations, but my first attempt
> was processor threads.
Sure!
Often, especially on HyperThreading machines, it makes a lot of sense to
let one operation be fast really quick, because all the data it accesses
needs to go to the CPU caches only once.
For example, even in relatively complex flow graphs with lots of blocks
where all CPU cores are kept busy all the time, FFTs that are run with
multiple threads tend to increase overall performance.

This is basically another variation of the "old" truth that on modern
hardware, it's typically better to process data in large chunks
uniformly; that might increase latency, but typically, the latency lost
is made up by higher system throughput.

Best regards,
Marcus
>
> Thanks in advance for your answers.
>
> Kind Regards,
> Gonzalo Arcos
>
[1]
https://lists.gnu.org/archive/html/discuss-gnuradio/2015-05/msg00320.html
[2] https://gnuradio.org/redmine/projects/gnuradio/wiki/PerformanceCounters
>
>
> _______________________________________________
> Discuss-gnuradio mailing list
> [email protected]
> https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

_______________________________________________
Discuss-gnuradio mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

Reply via email to