I think I may have found the problem: I was using the second parameter of 
dns_gethostbyname incorrectly, and likely writing things to that address.  


Anyway, another problem.  I was playing with reducing the size of 
PBUF_POOL_SIZE to save RAM.  I reduced it from 15 to 13, then I sometimes get 
processor exceptions.  Things were really quite stable as they were, but I just 
can't leave well enough alone I guess.  


The source of the issue seems to be in memp.c, line 325.  It looks like this:

  memp = memp_tab[type];
  
  if (memp != NULL) {
    memp_tab[type] = memp->next;                        //LINE 325
#if MEMP_OVERFLOW_CHECK
    memp->next = NULL;
    memp->file = file;
    memp->line = line;
#endif /* MEMP_OVERFLOW_CHECK */
    MEMP_STATS_INC_USED(used, type);
    LWIP_ASSERT("memp_malloc: memp properly aligned",
                ((mem_ptr_t)memp % MEM_ALIGNMENT) == 0);
    memp = (struct memp*)((u8_t*)memp + MEMP_SIZE);
  }



This was compiled with optimization so I can't entirely trust the reported line 
number.  Anyway, memp_tab[0] = 0x0, and memp_tab[1] = 0xb000a8c0, while members 
2-7 contain actual RAM addresses, which are in the format of 0x200xxxx and 
exist in .bss.memp_memory according to my .map file.  0xb000a8c0 is not a valid 
address on this chip whatsoever.  In the above code, I see that "type" is equal 
to MEMP_TCP_PCB, although I cannot find that declaration anywhere.  Must be 
some compiler magic.  

The chain of events that seemed to get me here was calling tcp_new(), which 
called memp_malloc() in tcp_alloc.  Any ideas why I'm getting this invalid 
address in memp_tab?  This is v1.3.2 in RAW mode.  My options are below for 
reference.

#define TCP_MSS                1460
#define PBUF_POOL_BUFSIZE    512
#define PBUF_POOL_SIZE        13
#define TCP_WND             (TCP_MSS*4)
#define TCP_SND_BUF         (TCP_MSS*10)       
#define MEM_SIZE            1024                
#define MEMP_NUM_PBUF        20
#define MEMP_NUM_TCP_SEG     20
#define TCP_SND_QUEUELEN    20        





________________________________
 From: Simon Goldschmidt <[email protected]>
To: JM <[email protected]>; Mailing list for lwIP users <[email protected]> 
Sent: Wednesday, December 21, 2011 8:02 AM
Subject: Re: [lwip-users] Tracking down source of corruption
 
JM <[email protected]> wrote:
> After establishing a TCP connection with a remote host (108.61.35.91, a
> radio station), disconnecting, then trying to reconnect, SYN packets are
> being sent, but the remote host doesn't respond.  It appears this is happening
> because lwIP isn't responding to ARP requests from the router,
> 192.168.0.1.  When I reset the unit it works again.

To see why it isn't responding to ARP requests, my first idea would be to 
enable lwIP's stats and have a look at the various 'err' or 'drop' members to 
see why packets are dropped (or the 'rx' members to see how many packets it 
thinks it has been receiving).

> But the weirdest thing is the device, IP address 192.168.0.176 which was
> assigned with DHCP, suddenly decides its IP is 56.7.0.32 when it sends a
> RST.  Its MAC is staying intact, and lwIPLocalIPAddrGet() is still reporting
> 192.168.0.176.

That's not too wierd: I'm guessing your lwIPLocalIPAddrGet() function returns 
the netif's IP address, whereas the RST (I'm assuming it is sent from 
tcp_slowtmr where lwIP decides it gives up sending SYN retries) uses the TCP 
PCB's local address to send the RST. That assumes in the last 1.3 seconds of 
the capture, something has corrupted that PCB.

Unforunately, that doesn't tell you who's corrupting the PCB memory... :-(

Simon
-- 
NEU: FreePhone - 0ct/min Handyspartarif mit Geld-zurück-Garantie!        
Jetzt informieren: http://www.gmx.net/de/go/freephone

_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users
_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to