Hello, It turns out that the assumption that the "transmit timed out"-issue was related to pause frames/flow control was incorrect. I have recently started to see the error again, with flow control disabled. However, unlike last time, I am now able to reliably trigger the issue.
The timeout seems to be triggered by connectivity problems between MT7621-based routers (not sure if it applies to other devices with the MT7530 switch) and the next hop. I checked each client connected to some of the routers exhibiting this issue, and turns out that some had bad cables, etc.. In order to check the theory in a more controlled fashion, I set up the following small testbed: NUC (192.168.1.1) <-> (192.168.1.2) ZBT 3526 (192.168.2.1) <-> (192.168.2.2) ZBT 2626 (192.168.3.1) <-> (192.168.3.2) Client I then configured port forwarding from the 3526 and all the way to the client and hammered the client with small UDP packets. Then, at random points, I intentionally hung the kernel on the 2626 by triggering an RCU error causing a stall. L2 was still up, but the 2626 does not reply to any packets, including ARP (so the neighbor-table entry for 192.168.2.2 is quickly lost). More or less as soon as the kernel hung, the transmit timeout-error message started showing up. If I restart networking or enable/disable the ports, then everything works fine for a bit (I can for example ping 192.168.1.1 from 192.168.1.2), but after some time the error appears again. I have been trying to solve this myself for a couple of days, but I am starting to run out of idea. Could it be that there is some traffic destined for the client (via. the 2626) that gets stuck in the TX queue on the 3526? Any help, pointers on where to look or ideas for what could be wrong would be much appreciated. Thanks in advance for the help, Kristian _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev