[lwip-users] lwip lock

Tazzari Davide Tue, 22 Mar 2011 08:44:18 -0700

Hi to all.

I have created an avr32 application based on FreeRtos and LWIP 1.3.2


My application is very huge but I want to concentrate on my problems. 

There are 2 tasks that use http connection: a web server and a web
client versus an external portal.

The application simply collects some data and, periodically, POST them
to an apache based web portal.

The web server is of course alive only when a browser wants to connect
otherwise is almost frozen in a listen status.

Here is my problem.

Sometime and somehow all the tcp connections are locked and lost: the
web server is no more accessible and the application cannot communicate
to the portal.

This seems to happen while I try to access to the web server and, in the
same time, the device tries to access to the portal.

 

I started to analyze the lwip and here is what I found.

 

In file mem.c I added the following code

 

 

static u8_t *ram;

/** the last entry, always unused! */

static struct mem *ram_end;

/** pointer to the lowest free block, this is used for faster search */

static struct mem *lfree;

 

u8_t ** ppMemRam;           // DT:2011/03/09

/** the last entry, always unused! */

struct mem ** ppMemRamEnd;  // DT:2011/03/09

/** pointer to the lowest free block, this is used for faster search */

struct mem ** ppMemLFree;   // DT:2011/03/09

 

...

 

void

mem_init(void)

{

...

  ppMemRam = & ram;           // DT:2011/03/09

  ppMemRamEnd = & ram_end;  // DT:2011/03/09

  ppMemLFree = & lfree;   // DT:2011/03/09

}

 

 

This permits to me to see (through a serial debugger) the status of the
heap area for the lwip data.

When problems happen, the "lfree" pointer is stacked at an address
different to "ram"

I tried to look ad the mem ram area and I found that the chain of the
various allocation was ok.

It seems that there was something not freed for some (for me) unknown
reason.

Sometimes this is not critical because the access is ok but the wasted
area grows up little by little saturating the area and locking the
communication.

 

I suppose this is not a cause but an effect so a continue my analysis.

I concentrate on the memp area

I study it being sure I didn't understand so much but, anyway, here is
what I discovered.

 

I show only the TCP_SEG area that seems relevant to me.

 

HEX       Offset     Delta    Block  Arg       RefCh    RefMem Free

1E08      2564       20      0      TCP_SEG    0         

1E1C      2584       20      1      TCP_SEG    1E08      

1E30      2604       20      2      TCP_SEG    1E1C      

1E44      2624       20      3      TCP_SEG    1E30      

1E58      2644       20      4      TCP_SEG    1E44      

1E6C      2664       20      5      TCP_SEG    1E58      

1E80      2684       20      6      TCP_SEG    1E6C      

1E94      2704       20     7      TCP_SEG    ?       1EE4   

1EA8      2724       20      8      TCP_SEG    ?       0 

1EBC      2744       20      9      TCP_SEG    1E80    -       xxx

1ED0      2764       20      10      TCP_SEG    ?       0 

1EE4      2784       20      11      TCP_SEG    ?       0

 

I try to describe...

HEX is the absolute address in memory of the memp block

Offset is the absolute offset in byte from the top of the whole memp
structure

Delta is the sizeof the single block

Block is the index of the block

RefCh is the address of the "next" block chained

RefMem is the address of the "next" block found surfing the memory

Free is the first free block

 

What seems is that the block 9 is the first free. The next one is the
6th, then 5th, 4th, 3, 2, 1, 0

Reading the memory I have seen that there is the block 7 chained to
block 11. These two blocks are chained but no more reachable.

Again block 10 and 8 seems to be no more reachable and chained to
nothing.

 

What I see is that these two phenomena are related: when I loose mem
area I lose TCP_SEG blocks as well

If we take a look at the tcp_seg structure

 

struct tcp_seg {

  struct tcp_seg *next;    /* used when putting segements on a queue */

  struct pbuf *p;          /* buffer containing data + TCP header */

...

 

 

we can see that there is a reference to pbuf. The lost tcp_seg blocks do
refers to that lost mem area!

 

Anyone has ever seen such a problem?

Any suggestion on how to solve it?

 

I read also the stats of the lwip memp

 

lwip_stats.memp[i].max

lwip_stats.memp[i].avail

lwip_stats.memp[i].used

 

and what I found is, for TCP_SEG, even 12, 12, 12 so all memp block
used!

 

I have one idea but I don't know if this maybe can create worst
problems. This is not a solution because I don't know the real problem
but it is a sort of sanity of the TCP_SEG blocks.

Looking at the example above posted I can chain the two lost blocks (10
and 8 ) to the top of the list and the chained blocks (7 and 11) to the
bottom of the list. In this way I can recover at least the lost blocks.
The chained blocks (7 and 11), in theory, can be still used and freed
or, at least, I don't know if they are really used or lost.

 

So, the result should be

7(chained) -> 11(lost) -> 9 (free) -> 6 -> 5 -> 4 -> 3 -> 2 -> 1 -> 0 ->
8 (lost)-> 10(lost)

This, of course, must be done by hand.

For block 8 and 10 I suppose I have to call also the mem_free function
on the block->p area.

 

Is it a good idea?

Again, does anybody know the problem or what the hell I have done to
create this problem?

 

Another problem. I don't know if it is related; maybe it is the same
problem but with a different effects. 

The tcp_thread stalls!

 

static void

tcpip_thread(void *arg)

{

...

  while (1) {                          /* MAIN Loop */

    gusTcpThread ++;  // DT 03/03/2011 Debug

    gucStatusTCPIP = 0; //DT 2011/03/04 TEST

    sys_mbox_fetch(mbox, (void *)&msg);

    gucStatusTCPIP = 1; //DT 2011/03/04 TEST

    switch (msg->type) {

#if LWIP_NETCONN

    case TCPIP_MSG_API:

      LWIP_DEBUGF(TCPIP_DEBUG, ("tcpip_thread: API message %p\n", (void
*)msg));

      gucStatusTCPIP = 2; //DT 2011/03/04 TEST

      msg->msg.apimsg->function(&(msg->msg.apimsg->msg));

      gucStatusTCPIP = 3; //DT 2011/03/04 TEST

      break;

#endif /* LWIP_NETCONN */

...

}

 

What I see is that the gusTcpThread counter is stopped. In this case the
debug variable gucStatusTCPIP is 2 so that it stalls in the call of the
api function. I don't know which one and which mbox is related to.

 

// Posts the "msg" to the mailbox. This function have to block until the
"msg"

// is really posted.

void sys_mbox_post(sys_mbox_t mbox, void *msg)

{

  // NOTE: we assume mbox != SYS_MBOX_NULL; iow, we assume the calling
function

  // takes care of checking the mbox validity before calling this
function.

  while( pdTRUE != xQueueSend( mbox, &msg, SYS_ARCH_BLOCKING_TICKTIMEOUT
) )

  {

      vTaskDelay(10); // DT 08/03/2011 Debug

      gusCntMBoxFull++; // DT 03/03/2011 Debug

  }

  gusCntMBoxFull = 0; // DT 03/03/2011 Debug

}

 

In the normal case the variable gusCntMBoxFull is supposed to be 0. If
the tcp thread is locked (the only one that can pop the queue) the queue
is continuously filled till its own fullness and that while loop is an
infinite loop.

 

Any idea? Do you think these two problems are the same problem with two
different effects? Consider that also this problem happens in the same
situation: web server and portal both on.

 

Last information. I have the optimization o1. I am going to try the
optimization o0 but I have to remove pieces of code so, it is not a
simple job.

 

Best regards

Davide

_______________________________________________
lwip-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/lwip-users

[lwip-users] lwip lock

Reply via email to