I think I may have found the problem: I was using the second parameter of
dns_gethostbyname incorrectly, and likely writing things to that address.
Anyway, another problem. I was playing with reducing the size of
PBUF_POOL_SIZE to save RAM. I reduced it from 15 to 13, then I sometimes get
processor exceptions. Things were really quite stable as they were, but I just
can't leave well enough alone I guess.
The source of the issue seems to be in memp.c, line 325. It looks like this:
memp = memp_tab[type];
if (memp != NULL) {
memp_tab[type] = memp->next; //LINE 325
#if MEMP_OVERFLOW_CHECK
memp->next = NULL;
memp->file = file;
memp->line = line;
#endif /* MEMP_OVERFLOW_CHECK */
MEMP_STATS_INC_USED(used, type);
LWIP_ASSERT("memp_malloc: memp properly aligned",
((mem_ptr_t)memp % MEM_ALIGNMENT) == 0);
memp = (struct memp*)((u8_t*)memp + MEMP_SIZE);
}
This was compiled with optimization so I can't entirely trust the reported line
number. Anyway, memp_tab[0] = 0x0, and memp_tab[1] = 0xb000a8c0, while members
2-7 contain actual RAM addresses, which are in the format of 0x200xxxx and
exist in .bss.memp_memory according to my .map file. 0xb000a8c0 is not a valid
address on this chip whatsoever. In the above code, I see that "type" is equal
to MEMP_TCP_PCB, although I cannot find that declaration anywhere. Must be
some compiler magic.
The chain of events that seemed to get me here was calling tcp_new(), which
called memp_malloc() in tcp_alloc. Any ideas why I'm getting this invalid
address in memp_tab? This is v1.3.2 in RAW mode. My options are below for
reference.
#define TCP_MSS 1460
#define PBUF_POOL_BUFSIZE 512
#define PBUF_POOL_SIZE 13
#define TCP_WND (TCP_MSS*4)
#define TCP_SND_BUF (TCP_MSS*10)
#define MEM_SIZE 1024
#define MEMP_NUM_PBUF 20
#define MEMP_NUM_TCP_SEG 20
#define TCP_SND_QUEUELEN 20
________________________________
From: Simon Goldschmidt <[email protected]>
To: JM <[email protected]>; Mailing list for lwIP users <[email protected]>
Sent: Wednesday, December 21, 2011 8:02 AM
Subject: Re: [lwip-users] Tracking down source of corruption
JM <[email protected]> wrote:
> After establishing a TCP connection with a remote host (108.61.35.91, a
> radio station), disconnecting, then trying to reconnect, SYN packets are
> being sent, but the remote host doesn't respond. It appears this is happening
> because lwIP isn't responding to ARP requests from the router,
> 192.168.0.1. When I reset the unit it works again.
To see why it isn't responding to ARP requests, my first idea would be to
enable lwIP's stats and have a look at the various 'err' or 'drop' members to
see why packets are dropped (or the 'rx' members to see how many packets it
thinks it has been receiving).
> But the weirdest thing is the device, IP address 192.168.0.176 which was
> assigned with DHCP, suddenly decides its IP is 56.7.0.32 when it sends a
> RST. Its MAC is staying intact, and lwIPLocalIPAddrGet() is still reporting
> 192.168.0.176.
That's not too wierd: I'm guessing your lwIPLocalIPAddrGet() function returns
the netif's IP address, whereas the RST (I'm assuming it is sent from
tcp_slowtmr where lwIP decides it gives up sending SYN retries) uses the TCP
PCB's local address to send the RST. That assumes in the last 1.3 seconds of
the capture, something has corrupted that PCB.
Unforunately, that doesn't tell you who's corrupting the PCB memory... :-(
Simon
--
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!
Jetzt informieren: http://www.gmx.net/de/go/freephone
_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users