Re: [lwip-users] Infinite hang in tcp_slowtmr()

Dinesh Pandey Thu, 29 Oct 2015 07:46:45 -0700

Looks like I found the cause of 'my' loop.

I was calling tcp_close twice on a TCP PCB.


The memp_free routine simply puts the TCP PCB at the head of the linked
list. If memp_free is called twice with the same TCP PCB, the first element
starts to points back to itself.

When a new TCP connection is created, the memp_alloc will returns this
looped member and you will end up with looped PCB linked list.


On Fri, Oct 23, 2015 at 7:34 PM, Dinesh Pandey <[email protected]>
wrote:

> Seeing a similar problem:
>
> Assertion "tcp_input: pcb->next != pcb (before cache)" failed at line 182
> in <...>/core/tcp_in.c
>
> I have two machines, one ARM and another i386 running the same code. I can
> reproduce it consistently on the ARM. Don't see it i386.
>
> The LWIP task is running with NO_SYS=1 (as one task in a multitasking
> environment).
>
> Will investigate over the next few days. Any hints welcome.
>
>
>
> On Wed, Oct 14, 2015 at 11:03 PM, Sylvain Rochet <[email protected]>
> wrote:
>
>> Hi Stephen,
>>
>> On Wed, Oct 14, 2015 at 09:13:59AM -0500, Stephen Cowell wrote:
>> > Hey Enrico,
>> > I'm using GNU toolchain/compiler, supplied with Atmel Studio 6.1.
>> > Since I've added the code I've had no other problems; I really don't
>> > have much time to research this, what with other pressures at work.
>> >
>> > It seems the issue is not unknown... sometimes the pdb ends up pointing
>> > to itself.  These times appear to be correlated to high-stress I/O.
>> >
>> > Obviously the last pdb should point to null... and it should never point
>> > to itself.  It is easy enough to catch it pointing to itself and make
>> that
>> > null.  I verified that this was the first pdb, that we weren't going to
>> > have a memory leak when we just terminated the list.  I did not have
>> > the resources to chase down when the pointer to self happened...
>> > I only know that it does, and that the pdb that this happens to is
>> > at the first allocated pdb address.  The obvious thing to do was to
>> > correct the pointer to break the endless loop... seems to work.
>> >
>> > As Sylvain wrote, the Atmel port has some serious differences from
>> > what he's used to seeing... I'm assuming this has something to do
>> > with it.  As I get more time (the product ships soon) I'll be able to
>> > spend some more time on this issue.  I'm just glad to get it out there
>> > and let others know it's happening.
>>
>> A linked list corruption is a very serious problem, you really must not
>> ship your product with such a known bug. Your workaround only mitigate a
>> single common corruption pattern on linked list, but that's only one of
>> them. It will break soon or later with an other pattern.
>>
>> If a linked list is corrupted it's because there is a reentrancy problem
>> in functions modifying the linked list. Which really limit the scope
>> where reentrancy can occur. We have critical sections for !NO_SYS
>> systems, you could use the critical sections hooks to check if
>> reentrancy constraints are respected,
>> SYS_ARCH_DECL_PROTECT/SYS_ARCH_PROTECT/SYS_ARCH_UNPROTECT.
>>
>> At least, if you want to ship your product very quickly, just define
>> those hooks to something appropriate (those are recursive locks so
>> you'll have take care of that) and you should be safe, for now.
>>
>> Sylvain
>>
>> _______________________________________________
>> lwip-users mailing list
>> [email protected]
>> https://lists.nongnu.org/mailman/listinfo/lwip-users
>>
>
>

_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] Infinite hang in tcp_slowtmr()

Reply via email to