Re: [lwip-users] Race condition in tpcip.c causing memory corruption

Stephane Lesage Wed, 27 Feb 2013 12:48:04 -0800

Hi,

Very interesting multithread tracing, but...


>In this scenario  a thread doing an outbound socket write results in a msg for 
>do_write getting posted to the mbox.
>This causes a context switch to the tcpip_thread() which fetches the msg from 
>the mailbox and begins processing.
>This thread gets context switched out before getting to the TCPIP_APIMSG_ACK().
>Execution is passed to a thread that is passing packets into lwip.

OK

>This thread gets into tcpip_apimsg() and posts to the mbox.

If you're talking about your netif driver giving the packets to the stack, then 
I think this is wrong.
You should use tcpip_input().
This function will create a TCPIP_MSG_INPKT message and sys_mbox_trypost() it 
to the tcpip thread.

>No context switch occurs (because tcpip_thread() is not currently waiting in 
>the fetch call)
>so this receive thread makes it to the 
>sys_arch_sem_wait(&apimsg->msg.conn->op_completed, 0) call and blocks.

Clear, passing a packet to the stack works at the lowest level: you give a pbuf 
from your netif.
It cannot involve a PCB or a netconn and its semaphore...

>Now a context switch occurs back to the outbound thread which finally makes it 
>to the same sys_arch_sem_wait() call and blocks.
>Now context is switched to the tcpip_thread which finish the do_write() 
>execution and calls TCPIP_APIMSG_ACK().
>This should have unblocked the outbound thread however the first one to block 
>on that sem was the inbound thread
>(which still has it's message posted in the mbox) so the inbound thread 
>receives the signal.
>Now the tcpip_thread() grabs the inbound msg (which container was on the 
>inbound thread's stack which has been popped)
>and starts processing the message.  That container can now be corrupted since 
>the stack has been popped.
>Bad things happen after this.....

Of course, and this is why LwIP does not support multiple threads using the 
same socket (without the core locking option)

>I'm wondering if I'm somehow using the interfaces wrong to cause this to 
>happen. 
>I fixed this by protecting the tcpip_apimsg()  call with a semaphore to stop 
>reentrancy.   
>I'm I doing something wrong or is this a real bug?

If I understand correctly, then you just need to use tcpip_input(pbuf, netif) 
in your driver RX thread.


PS: I personally do not like the overhead of using a RX thread and/or 
tcpip_input() function which dynamically allocates a message.

My init function allocates a static rxmsg = tcpip_callbackmsg_new(rx_callback, 
netif);
My interrupts do fast/minimal DMA queue processing and call 
tcpip_trycallback(rxmsg) (only if necessary)
Then my rx_callback() does the actual job in the tcpip thread context:
- loop to extract pbuf from the "completed" DMA descriptors queue
- snmp/statistics update
- call ethernet_input(pbuf, netif)
- try to reallocate a new pbuf to reuse the now free DMA descriptor

-- 
Stephane Lesage


_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] Race condition in tpcip.c causing memory corruption

Reply via email to