Hi Gilad, part of this is for the future reader of this thread, so, please, bear with me:
On 07.11.2017 10:42, Gilad Beeri (ApolloShield) wrote: > I have a flowgraph, that when run, no CPU core is ever close to 100% > utilization. Indeed, dropped samples indicate a bottleneck narrower than your USRP's sampling rate, but that bottleneck doesn't have to be CPU overutilization! Simplest example: add a Throttle block to a flow graph that otherwise wouldn't produce any problems with half the necessary sampling rate. Most often, I find that IO operations actually become the the bottleneck – be it that sending samples to the USRP (or receiving them) is actually pretty time-intense, or that you need to interact with storage. Depending on the tooling you choose, this fact might or might not be hidden; time spent, for example "on behalf" of a thread in Kernel land, searching for a contiguous piece of memory to give to that process, or handling USB buffers or... might or might not be attributed to the process. Another very classical problem is memory bandwidth and latency; so, as shown by SE at this year's GRCon, chances aren't that bad that you can optimize quite a bit if you co-locate connected blocks on the same CPU, you get a caching advantage (or, rather, not incur a disadvantage). That all being said, how do you proceed? First of all, this is one of the cases where having ControlPort is very helpful. If you have it (with Thrift and PerfCounters enabled), you can start the CtrlPort Performance Monitor, and see which output buffers "stay full" all the time. Block after that is probably your bottleneck. If you don't, try running `perf top -ag` (as root might help here, you want to also inspect kernel times, not quite sure about that, though). You should be getting a listing of "when we sampled where the CPU(s) were, in x % of the time, they were stuck in these functions". I really tried, but haven't had the time to work with kernelshark. That might really be a tool of choice here. In fact, it looks so cool that I could imagine that we one day supersede the perf counter concept with that; who knows. If you do happen to look into that, I'd be very happy to get some feedback about the process, and what the problems were. I think this is definitely something we want to enable users to do – understand not only the behaviour of their blocks in isolation, but how a system works. After all, one of the major "let's dream about a GNU Radio in the future" things we're considering is making it easy to distribute a flow graph across computers, and for that, systemic insight pretty much is a must. Best regards, Marcus
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Discuss-gnuradio mailing list [email protected] https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
