Mason wrote:
Bill Auerbach wrote:

That 7.4% for memcpy is a direct hit on throughput.  You're seeing a
breakdown of total CPU time.  How much of that 7+% for memcpy comes out of
the total time used by lwIP?  I think you'll find that to be a much larger
hit and a large contributor to lower bandwidth.
Bill,

IMHO, the elephant in the room is task-switching, as correctly
pointed out by Kieran.
Well, given a correctly DMA-enabled driver, you could avoid one task switch by checking RX packets from tcpip_thread instead of using another thread for RX (as suggest your "Task breakdown" by the name "RxTask"). You would then set a flag / post a static message from your ISR, process the packet in tcpip_thread (without having to copy it) and post the data to your application thread.

Also, by using the (still somewhat experimental) LWIP_TCPIP_CORE_LOCKING feature, you can also avoid the task switch from application task to tcpip_thread (by using a mutex to lock the core instead of passing a message).
Assuming that every memcpy were lwip-related, and that I could
get rid of them (which I don't see how, given Simon's comments)
the transfer would take 478 instead of 516 seconds.
I didn't mean to discourage you with my comments, I only meant it doesn't work out-of-the box with a current lwIP. However, I know it's not as easy for an lwip beginner to make the changes required for the RX side (the TX side should not be a problem via adapting the mem_malloc() functions).

If I made the changes to support PBUF_REF for RX in git, would you be able to switch to that for testing?

I plan to implement zero-copy on an ARM-based board I have here, but I haven't found the time for that, lately :-(

Simon

_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to