Mason wrote:
Bill Auerbach wrote:
That 7.4% for memcpy is a direct hit on throughput. You're seeing a
breakdown of total CPU time. How much of that 7+% for memcpy comes out of
the total time used by lwIP? I think you'll find that to be a much larger
hit and a large contributor to lower bandwidth.
Bill,
IMHO, the elephant in the room is task-switching, as correctly
pointed out by Kieran.
Well, given a correctly DMA-enabled driver, you could avoid one task
switch by checking RX packets from tcpip_thread instead of using another
thread for RX (as suggest your "Task breakdown" by the name "RxTask").
You would then set a flag / post a static message from your ISR, process
the packet in tcpip_thread (without having to copy it) and post the data
to your application thread.
Also, by using the (still somewhat experimental) LWIP_TCPIP_CORE_LOCKING
feature, you can also avoid the task switch from application task to
tcpip_thread (by using a mutex to lock the core instead of passing a
message).
Assuming that every memcpy were lwip-related, and that I could
get rid of them (which I don't see how, given Simon's comments)
the transfer would take 478 instead of 516 seconds.
I didn't mean to discourage you with my comments, I only meant it
doesn't work out-of-the box with a current lwIP. However, I know it's
not as easy for an lwip beginner to make the changes required for the RX
side (the TX side should not be a problem via adapting the mem_malloc()
functions).
If I made the changes to support PBUF_REF for RX in git, would you be
able to switch to that for testing?
I plan to implement zero-copy on an ARM-based board I have here, but I
haven't found the time for that, lately :-(
Simon
_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users