Re: Problem with porting an OOT module to 3.8 (sampling speed)

Marcus Müller Fri, 02 Oct 2020 23:25:04 -0700

Aha! That's examctl the "*very* specific" case Iwas referring to earlier.

This is going to be pretty specific to how you interact with libusb, butin essence:The 3.7 / 3.8 paradigm is still to block in work() until you have data,as a source. That's architectually not pretty, but it is what it is.

I think somewhere in the later 3.7 and definitely for 3.8 we introduceda delay before work() gets re-called in case it returns 0, as that wouldlead to a CPU core just spinning on that block (although it actually hasnothing to do, which means that the less data your block produces, themore persistently it grabs CPU).


You can remedy that:

* for the current scheduling, block in work, with a timeout, instead ofusing the non-blocking libusb receive methods. I.e. use the synchronouslibusb API. (not blocking but then relying on your work() being calledin a spinlock manner so that no overflows happen is not a sensible useof the async IO API in libusb.)* Use libusb's async io[0] with a callback: libusb_fill_bulk_transfer(),and set the callback function to a function that sends a`pmt::cons(pmt::intern("done"), pmt::mp(false))`¹ to your block's"system" message port. That should "cancel" the 250ms wait betweenwork() calls (if one is currently going on). This will require anaddition copy from a buffer you've filled, but that might be less costlythan one would fear (data is hot anyways)* what I'll call the sleeper/waker pattern: keep your block as it is.follow [1] to get a file descriptor to your libusb handle. Have a threadthat uses `epoll`, `poll` or `select`² to passively monitor the USBendpoint (without using CPU for that). Then, when new data arrived,you'd wake up your block using the same "system" port method above. Inthe block's work function, you'd then use the non-blocking libusbfunctions to get the data (which now is there - otherwise the callbackwouldn't have been called).


Best regards,
Marcus

¹ performance hint: have a single object that you hold on to (e.g.`static pmt::pmt_t done_msg = pmt::cons...`) and re-send; constructingPMT symbols is expensive; keeping a single PMT with refcounter isn't)² I'm not too deep into the details of these APIs/syscalls, but epoll isprobably the thing you want in most cases, especially if you're watchingmore than one file descriptor.


[0] http://libusb.sourceforge.net/api-1.0/group__libusb__asyncio.html
[1] http://libusb.sourceforge.net/api-1.0/group__libusb__poll.html

On 10/2/20 10:27 PM, [email protected] wrote:

Hi Wojciech,

On 02.10.20 15:51, Wojciech Kazubski wrote:
I suppose that in 3.8 the source block has to signal required samplerate to
GR runtime. Is this correct?
No, that's not correct. The runtime doesn't care about required rates at
all. It just makes blocks produce items as fast as they can.
How to fix the code?
Good question! Unless you're doing something *very* specific in your
code, I'd honestly blame this on a bug that was introduced during
porting – but I might be wrong.

Two very common tools to investigate this are rather simple:

1. htop (a `top`-alike program with a bit better visualization), with
thread names enabled – that way you can see which block occupies the CPU
the most, because all blocks run in their own thread
2. `perf` (especially, `perf top -a`), which allows you to sample in
which function your CPU cores are most often. That way, you can identify
functions that might be the blockers there.

Best regards,
Marcus
I thik that the problem is not related to CPU load by my blck. Thetotal CPU load is very low, less than 5%. I made some debugging byaddin some test messages printed each time data is written to or readfrom the internal buffer. Data is written to the buffer by libusb andread by calling a "work" function (in bloccks of 4096 samples at a time).
In 3.7 I see great number of messages indicatig tha data ie read fromthe buffer. Part (about half) of the messages show 0 bytes read, dueto lack of new data from usb. By this, data rate is reduced from ~30Msps to 16.368Msps.
In contrast in 3.8 the work function is called only about 4 times asecond, giving 16ksps. This is some 2000 times less than in 3.7.
--
Best regards
Wojciech

Re: Problem with porting an OOT module to 3.8 (sampling speed)

Reply via email to