Aha! That's examctl the "*very* specific" case Iwas referring to earlier.
This is going to be pretty specific to how you interact with libusb, but
in essence:
The 3.7 / 3.8 paradigm is still to block in work() until you have data,
as a source. That's architectually not pretty, but it is what it is.
I think somewhere in the later 3.7 and definitely for 3.8 we introduced
a delay before work() gets re-called in case it returns 0, as that would
lead to a CPU core just spinning on that block (although it actually has
nothing to do, which means that the less data your block produces, the
more persistently it grabs CPU).
You can remedy that:
* for the current scheduling, block in work, with a timeout, instead of
using the non-blocking libusb receive methods. I.e. use the synchronous
libusb API. (not blocking but then relying on your work() being called
in a spinlock manner so that no overflows happen is not a sensible use
of the async IO API in libusb.)
* Use libusb's async io[0] with a callback: libusb_fill_bulk_transfer(),
and set the callback function to a function that sends a
`pmt::cons(pmt::intern("done"), pmt::mp(false))`¹ to your block's
"system" message port. That should "cancel" the 250ms wait between
work() calls (if one is currently going on). This will require an
addition copy from a buffer you've filled, but that might be less costly
than one would fear (data is hot anyways)
* what I'll call the sleeper/waker pattern: keep your block as it is.
follow [1] to get a file descriptor to your libusb handle. Have a thread
that uses `epoll`, `poll` or `select`² to passively monitor the USB
endpoint (without using CPU for that). Then, when new data arrived,
you'd wake up your block using the same "system" port method above. In
the block's work function, you'd then use the non-blocking libusb
functions to get the data (which now is there - otherwise the callback
wouldn't have been called).
Best regards,
Marcus
¹ performance hint: have a single object that you hold on to (e.g.
`static pmt::pmt_t done_msg = pmt::cons...`) and re-send; constructing
PMT symbols is expensive; keeping a single PMT with refcounter isn't)
² I'm not too deep into the details of these APIs/syscalls, but epoll is
probably the thing you want in most cases, especially if you're watching
more than one file descriptor.
[0] http://libusb.sourceforge.net/api-1.0/group__libusb__asyncio.html
[1] http://libusb.sourceforge.net/api-1.0/group__libusb__poll.html
On 10/2/20 10:27 PM, [email protected] wrote:
Hi Wojciech,
On 02.10.20 15:51, Wojciech Kazubski wrote:
I suppose that in 3.8 the source block has to signal required sample
rate to
GR runtime. Is this correct?
No, that's not correct. The runtime doesn't care about required rates at
all. It just makes blocks produce items as fast as they can.
How to fix the code?
Good question! Unless you're doing something *very* specific in your
code, I'd honestly blame this on a bug that was introduced during
porting – but I might be wrong.
Two very common tools to investigate this are rather simple:
1. htop (a `top`-alike program with a bit better visualization), with
thread names enabled – that way you can see which block occupies the CPU
the most, because all blocks run in their own thread
2. `perf` (especially, `perf top -a`), which allows you to sample in
which function your CPU cores are most often. That way, you can identify
functions that might be the blockers there.
Best regards,
Marcus
I thik that the problem is not related to CPU load by my blck. The
total CPU load is very low, less than 5%. I made some debugging by
addin some test messages printed each time data is written to or read
from the internal buffer. Data is written to the buffer by libusb and
read by calling a "work" function (in bloccks of 4096 samples at a time).
In 3.7 I see great number of messages indicatig tha data ie read from
the buffer. Part (about half) of the messages show 0 bytes read, due
to lack of new data from usb. By this, data rate is reduced from
~30Msps to 16.368Msps.
In contrast in 3.8 the work function is called only about 4 times a
second, giving 16ksps. This is some 2000 times less than in 3.7.
--
Best regards
Wojciech