Hello, On Thu, Nov 9, 2017 at 8:42 PM, Kristian Evensen <kristian.even...@gmail.com> wrote: > I replaced the 3526 with other devices containing the mt7530 switch > (both mt7621 and mt7623-based boards), and the issues seems to be > related to the switch rather than the SoC. I am able to reliably > trigger the timeout on all devices I have tested, both running > proprietary drivers/firmware and LEDE. I guess this points to that > there is some traffic pattern or network behavior that triggers an > error in the MT7530 and causes TX to freeze. Restarting the ports > makes the switch work again, but as long as the "bad" device is > connected to the mt7530 then it is just a matter of time before the > timeout is back.
I think I am ready to conclude on this issue. First of all, I have discovered that I made an incorrect statement earlier. I have not seen the problem with flow control disabled. After finding a network tap and a device which passes for example pause frames to the driver so I can see them with tcpdump, I think I finally see what is going on. I connected the tap between router #1 and router #2, and performed the test described earlier with flow control enabled and disabled. When triggering the RCU stall, I see a continuous flood of pause frames coming from router #2. This flood happens irrespective of if flow control is enabled or not. However, with flow control enabled, I see that the RxPause- and TxPause-counters increase. With flow control disabled, they remain at 0. In other words, it seems that the switch filters out pause frames if the bit is unset in the feature register (it would be great if someone could confirm/deny this). The MT7530 switch seems to use one buffer for all ports, so what I have seen all along is head of line blocking. Since I use iperf in UDP mode, the traffic destined for router #2 never slows down and fills up the buffer. Thus, all other traffic is blocked. When disabling the port used by route #2, the buffer is cleared and packets can flow as normal again. With flow control disabled, I do not see the head of line blocking. If I am connected to router #1, I can always reach it. If flow control is enabled, router #1 stops replying to for example ping when the pause flood starts. I don't know what is the correct "solution" for this problem. I asked Piotr to mark my patch for always disabling flow control as not applicable, but perhaps it should be brought back if everyone agrees that disabling flow control is ok. If not, then perhaps the following patch should be accepted so that it is possible to switch flow control on/off: https://lists.openwrt.org/pipermail/openwrt-devel/2016-April/040705.html BR, Kristian _______________________________________________ Lede-dev mailing list Lede-dev@lists.infradead.org http://lists.infradead.org/mailman/listinfo/lede-dev