Timmy,

 

I haven't benchmarked to be able to provide factual data, but I've done a
lot of optimization and tweaking of lwIP to improve bandwidth and my study
of pbufs and memory pools did not show the need for improvement considering
all of the other things required to handle a TCP connection.  Pbuf_alloc of
PBUF_POOLS doesn't use a lot of runtime when the alloc fits in one pbuf, and
memp_malloc and memp_free run only a few lines of simple C code to complete.
Pbuf_free also does very little on a single (unchained) pbuf.  You are in a
position to test for the actual improvement.  I would be curious (and
surprised) if the overall performance increases significantly, or even
noticeably.  From my experience, there are several other areas to improve
that significantly increase performance.  One of them I submitted a patch
for and is already included in lwIP and others are optimizing your Ethernet
port, improving inet_chksum and using zero-copy TX and RX.  For me
optimizing memcpy (using assembly code and unrolled loops and indexed
addresses) helped a good bit as well.

 

Bill

 

From: [email protected]
[mailto:[email protected]] On Behalf
Of [email protected]
Sent: Wednesday, November 02, 2011 1:10 PM
To: Mailing list for lwIP users
Subject: Re: [lwip-users] Automatic Rx DMA ring replenish

 

On 30 okt 2011 18:13 "Simon Goldschmidt"  <mailto:[email protected]>
<[email protected]> wrote:



"[email protected]"  <mailto:[email protected]>
<[email protected]> wrote:



What if I make the Rx DMA buffer descriptor ring large enough to hold all
POOL pbufs. At start-up all POOL pbufs are allocated and put in the Rx DMA
ring.
pbuf_free() is modified so that whenever a POOL pbuf is freed it is
immediately put in the Rx DMA ring.

This should improve performance, as well as simplify the ethernet driver a
bit.


If it works for your hardware, good enough. The modification would probably
be calling your custom free function instead of memp_free from pbuf_free.

However, I don't think that will work with many DMA enabled MACs: the ones
I've worked with have the RX descriptors in internal memory, so the ring
can't be made larger. And because RX packets are sometimes buffered (i.e.
TCP OOS data), you will want to have many more PBUF_POOL pbufs than fit into
your DMA ring (depending on its size and the expected throughput, of
course).

However, I guess providing a way to change memory allocation/deallocation to
use custom functions would be a good thing to support many different types
of zero copy MACs without having to change the lwIP code for every hardware,
so I guess it's well worth a try for your target!

Simon


I have tested this method on my hardware and it works nicely.
This is my suggestion for how it can be implemented in LwIP:

In pbuf.c, function pbuf_free(), change this:

   /* is this a pbuf from the pool? */
   if (type == PBUF_POOL) {
     memp_free(MEMP_PBUF_POOL, p);

To this:

   if (type == PBUF_POOL) {
     if( !DMA_RING_REPLENISH( p ) ) {
       memp_free(MEMP_PBUF_POOL, p);
     }

In opt.h, add this:

#ifndef DMA_RING_REPLENISH
#define DMA_RING_REPLENISH( p ) 0
#endif

In lwipopts.h, the feature can be enabled by a define like this:

#define DMA_RING_REPLENISH( p ) MAC_ReplenishRx( p )


The way it works is that whenever a PBUF_POOL is deallocated, it is first
offered to the Ethernet driver via the function DMA_RING_REPLENISH(). If the
Ethernet driver wants the pbuf, it returns true. If however the Ethernet
driver does not want the pbuf at this time (DMA ring is already full), then
the pbuf is is freed normally using memp_free().

By offering the pbuf to the Ethernet driver directly, the entire
memp_free(), context switch, pbuf_alloc() sequence is bypassed, saving CPU
cycles.

Regards,
Timmy Brolin

_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to