Hi, Very interesting multithread tracing, but...
>In this scenario a thread doing an outbound socket write results in a msg for >do_write getting posted to the mbox. >This causes a context switch to the tcpip_thread() which fetches the msg from >the mailbox and begins processing. >This thread gets context switched out before getting to the TCPIP_APIMSG_ACK(). >Execution is passed to a thread that is passing packets into lwip. OK >This thread gets into tcpip_apimsg() and posts to the mbox. If you're talking about your netif driver giving the packets to the stack, then I think this is wrong. You should use tcpip_input(). This function will create a TCPIP_MSG_INPKT message and sys_mbox_trypost() it to the tcpip thread. >No context switch occurs (because tcpip_thread() is not currently waiting in >the fetch call) >so this receive thread makes it to the >sys_arch_sem_wait(&apimsg->msg.conn->op_completed, 0) call and blocks. Clear, passing a packet to the stack works at the lowest level: you give a pbuf from your netif. It cannot involve a PCB or a netconn and its semaphore... >Now a context switch occurs back to the outbound thread which finally makes it >to the same sys_arch_sem_wait() call and blocks. >Now context is switched to the tcpip_thread which finish the do_write() >execution and calls TCPIP_APIMSG_ACK(). >This should have unblocked the outbound thread however the first one to block >on that sem was the inbound thread >(which still has it's message posted in the mbox) so the inbound thread >receives the signal. >Now the tcpip_thread() grabs the inbound msg (which container was on the >inbound thread's stack which has been popped) >and starts processing the message. That container can now be corrupted since >the stack has been popped. >Bad things happen after this..... Of course, and this is why LwIP does not support multiple threads using the same socket (without the core locking option) >I'm wondering if I'm somehow using the interfaces wrong to cause this to >happen. >I fixed this by protecting the tcpip_apimsg() call with a semaphore to stop >reentrancy. >I'm I doing something wrong or is this a real bug? If I understand correctly, then you just need to use tcpip_input(pbuf, netif) in your driver RX thread. PS: I personally do not like the overhead of using a RX thread and/or tcpip_input() function which dynamically allocates a message. My init function allocates a static rxmsg = tcpip_callbackmsg_new(rx_callback, netif); My interrupts do fast/minimal DMA queue processing and call tcpip_trycallback(rxmsg) (only if necessary) Then my rx_callback() does the actual job in the tcpip thread context: - loop to extract pbuf from the "completed" DMA descriptors queue - snmp/statistics update - call ethernet_input(pbuf, netif) - try to reallocate a new pbuf to reuse the now free DMA descriptor -- Stephane Lesage _______________________________________________ lwip-users mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/lwip-users
