fjpanag commented on PR #6956:
URL: https://github.com/apache/incubator-nuttx/pull/6956#issuecomment-1232200183

   I managed to cause the issue with the stuck thread again. Here is how I did 
it.
   
   My setup has as follows:
   
   ```mermaid
   graph LR
       A[Device] -->|Ethernet A| B
       B[Switch] -->|Ethernet B| C
       C[LAN]
   ```
   
   So, everything was working normally.  
   Two threads open a TCP socket towards two different servers, and happily 
exchange data.
   
   I cause a disruption on the network, that is I unplug the Ethernet cable 
**B**.  
   The communications are troubled, but the device sees the link to be still UP.
   
   The problem in server communications causes both connections to be closed.  
   This will lead to two calls to `close()`, one in every thread.
   
   I connect Ethernet cable **B** back. Communication is now possible.  
   The two threads call `socket()`, followed by `connect()` etc...
   
   Thread A manages to connect to the server. It exchanges data normally.
   **Thread B is stuck in `connect()` again!**
   
   The system does not seem to be able to recover. After a couple of minutes I 
repeat the experiment.  
   Once again I disconnect and reconnect Ethernet cable **B**.
   
   **Now Thread A also gets stuck in `connect()`!**
   Thread B remains stuck.
   
   The system cannot recover in any way, it needs a full reboot.
   
   ---
   
   While the issue manifested itself, I took a look around with my debugger and 
I saw:
   * There was no dead-lock, i.e. two threads waiting for each-other.
   * All other threads were running normally. Typically not much to do, so they 
mostly `sleep()`.
   * Both workers in the low-priority queue were mostly `sleep()`ing.
   * Another part of the system did use the workers successfully.
   * The workers never executed anything related to the network while in this 
state.
   * While in this state I unplugged Ethernet cable **A**. Nothing happened. 
(Expected, thread is stuck).
   * I re-connected Ethernet cable **A**. Again, nothing happened. (Expected)
   * `devif_dev_event(dev, NETDEV_DOWN);` (file netdev_ioctl.c, line 1946) 
cannot be called, thread is stuck.
   
   Both stuck threads have the same call trace:
   
   
![image](https://user-images.githubusercontent.com/46975045/187548884-20751ca9-2821-47d6-8a8e-700d0eb6f8bb.png)
   
   
![image](https://user-images.githubusercontent.com/46975045/187548929-ce29a4b0-7845-46f4-afc1-0c44ea7eed24.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to