Hi to all.
I have created an avr32 application based on FreeRtos and LWIP 1.3.2
My application is very huge but I want to concentrate on my problems.
There are 2 tasks that use http connection: a web server and a web
client versus an external portal.
The application simply collects some data and, periodically, POST them
to an apache based web portal.
The web server is of course alive only when a browser wants to connect
otherwise is almost frozen in a listen status.
Here is my problem.
Sometime and somehow all the tcp connections are locked and lost: the
web server is no more accessible and the application cannot communicate
to the portal.
This seems to happen while I try to access to the web server and, in the
same time, the device tries to access to the portal.
I started to analyze the lwip and here is what I found.
In file mem.c I added the following code
static u8_t *ram;
/** the last entry, always unused! */
static struct mem *ram_end;
/** pointer to the lowest free block, this is used for faster search */
static struct mem *lfree;
u8_t ** ppMemRam; // DT:2011/03/09
/** the last entry, always unused! */
struct mem ** ppMemRamEnd; // DT:2011/03/09
/** pointer to the lowest free block, this is used for faster search */
struct mem ** ppMemLFree; // DT:2011/03/09
...
void
mem_init(void)
{
...
ppMemRam = & ram; // DT:2011/03/09
ppMemRamEnd = & ram_end; // DT:2011/03/09
ppMemLFree = & lfree; // DT:2011/03/09
}
This permits to me to see (through a serial debugger) the status of the
heap area for the lwip data.
When problems happen, the "lfree" pointer is stacked at an address
different to "ram"
I tried to look ad the mem ram area and I found that the chain of the
various allocation was ok.
It seems that there was something not freed for some (for me) unknown
reason.
Sometimes this is not critical because the access is ok but the wasted
area grows up little by little saturating the area and locking the
communication.
I suppose this is not a cause but an effect so a continue my analysis.
I concentrate on the memp area
I study it being sure I didn't understand so much but, anyway, here is
what I discovered.
I show only the TCP_SEG area that seems relevant to me.
HEX Offset Delta Block Arg RefCh RefMem Free
1E08 2564 20 0 TCP_SEG 0
1E1C 2584 20 1 TCP_SEG 1E08
1E30 2604 20 2 TCP_SEG 1E1C
1E44 2624 20 3 TCP_SEG 1E30
1E58 2644 20 4 TCP_SEG 1E44
1E6C 2664 20 5 TCP_SEG 1E58
1E80 2684 20 6 TCP_SEG 1E6C
1E94 2704 20 7 TCP_SEG ? 1EE4
1EA8 2724 20 8 TCP_SEG ? 0
1EBC 2744 20 9 TCP_SEG 1E80 - xxx
1ED0 2764 20 10 TCP_SEG ? 0
1EE4 2784 20 11 TCP_SEG ? 0
I try to describe...
HEX is the absolute address in memory of the memp block
Offset is the absolute offset in byte from the top of the whole memp
structure
Delta is the sizeof the single block
Block is the index of the block
RefCh is the address of the "next" block chained
RefMem is the address of the "next" block found surfing the memory
Free is the first free block
What seems is that the block 9 is the first free. The next one is the
6th, then 5th, 4th, 3, 2, 1, 0
Reading the memory I have seen that there is the block 7 chained to
block 11. These two blocks are chained but no more reachable.
Again block 10 and 8 seems to be no more reachable and chained to
nothing.
What I see is that these two phenomena are related: when I loose mem
area I lose TCP_SEG blocks as well
If we take a look at the tcp_seg structure
struct tcp_seg {
struct tcp_seg *next; /* used when putting segements on a queue */
struct pbuf *p; /* buffer containing data + TCP header */
...
we can see that there is a reference to pbuf. The lost tcp_seg blocks do
refers to that lost mem area!
Anyone has ever seen such a problem?
Any suggestion on how to solve it?
I read also the stats of the lwip memp
lwip_stats.memp[i].max
lwip_stats.memp[i].avail
lwip_stats.memp[i].used
and what I found is, for TCP_SEG, even 12, 12, 12 so all memp block
used!
I have one idea but I don't know if this maybe can create worst
problems. This is not a solution because I don't know the real problem
but it is a sort of sanity of the TCP_SEG blocks.
Looking at the example above posted I can chain the two lost blocks (10
and 8 ) to the top of the list and the chained blocks (7 and 11) to the
bottom of the list. In this way I can recover at least the lost blocks.
The chained blocks (7 and 11), in theory, can be still used and freed
or, at least, I don't know if they are really used or lost.
So, the result should be
7(chained) -> 11(lost) -> 9 (free) -> 6 -> 5 -> 4 -> 3 -> 2 -> 1 -> 0 ->
8 (lost)-> 10(lost)
This, of course, must be done by hand.
For block 8 and 10 I suppose I have to call also the mem_free function
on the block->p area.
Is it a good idea?
Again, does anybody know the problem or what the hell I have done to
create this problem?
Another problem. I don't know if it is related; maybe it is the same
problem but with a different effects.
The tcp_thread stalls!
static void
tcpip_thread(void *arg)
{
...
while (1) { /* MAIN Loop */
gusTcpThread ++; // DT 03/03/2011 Debug
gucStatusTCPIP = 0; //DT 2011/03/04 TEST
sys_mbox_fetch(mbox, (void *)&msg);
gucStatusTCPIP = 1; //DT 2011/03/04 TEST
switch (msg->type) {
#if LWIP_NETCONN
case TCPIP_MSG_API:
LWIP_DEBUGF(TCPIP_DEBUG, ("tcpip_thread: API message %p\n", (void
*)msg));
gucStatusTCPIP = 2; //DT 2011/03/04 TEST
msg->msg.apimsg->function(&(msg->msg.apimsg->msg));
gucStatusTCPIP = 3; //DT 2011/03/04 TEST
break;
#endif /* LWIP_NETCONN */
...
}
What I see is that the gusTcpThread counter is stopped. In this case the
debug variable gucStatusTCPIP is 2 so that it stalls in the call of the
api function. I don't know which one and which mbox is related to.
// Posts the "msg" to the mailbox. This function have to block until the
"msg"
// is really posted.
void sys_mbox_post(sys_mbox_t mbox, void *msg)
{
// NOTE: we assume mbox != SYS_MBOX_NULL; iow, we assume the calling
function
// takes care of checking the mbox validity before calling this
function.
while( pdTRUE != xQueueSend( mbox, &msg, SYS_ARCH_BLOCKING_TICKTIMEOUT
) )
{
vTaskDelay(10); // DT 08/03/2011 Debug
gusCntMBoxFull++; // DT 03/03/2011 Debug
}
gusCntMBoxFull = 0; // DT 03/03/2011 Debug
}
In the normal case the variable gusCntMBoxFull is supposed to be 0. If
the tcp thread is locked (the only one that can pop the queue) the queue
is continuously filled till its own fullness and that while loop is an
infinite loop.
Any idea? Do you think these two problems are the same problem with two
different effects? Consider that also this problem happens in the same
situation: web server and portal both on.
Last information. I have the optimization o1. I am going to try the
optimization o0 but I have to remove pieces of code so, it is not a
simple job.
Best regards
Davide
_______________________________________________
lwip-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/lwip-users