Hi Fotis, Are you in sync with mainline?
If you can create a host application to induce the issue will be easier for us to test. BR, Alan On 8/9/22, Fotis Panagiotopoulos <f.j.pa...@gmail.com> wrote: > Hello, > > still trying to make the network work reliably. > After fixing another issue of my application, I hit another problem. > > The following sequence causes NuttX to crash: > > 1. My application is creating a TCP socket and communicates with a server. > 2. At one point the server stops responding (unrelated to NuttX / network > issue). > 3. The application detects the timeout, and calls close() on the socket. > 4. A new socket is created, and it is connected to the server. > 5. At this point, the server decides to send a FIN message for the previous > connection. > 6. I get a failed assertion in devif_callback.c at line 85. > > Note that I haven't managed to manually reproduce this issue. > No matter what I do manually, everything seems to be working correctly. > I just have to wait for it to happen. > It seems that it is only triggered if a FIN arrives **after** a SYN. > > I am sure that this is only happening with CONFIG_NET_TCP_WRITE_BUFFERS > enabled. > I have no problems without buffering. > > The assertion seems right to fire. > When a FIN is received for a closed connection, the same callback is free'd > both by tcp_lost_connection() and later on by tcp_close_eventhandler(). > All these are happening within the same execution of tcp_input(). > > Any ideas? > > > > > > > > > > > > > > > > > > > > On Tue, Jul 26, 2022 at 3:44 PM Sebastien Lorquet <sebast...@lorquet.fr> > wrote: > >> Hi, >> >> good find but >> >> -I dont think any usual application tinkers with PHY regs during its >> lifetime except the ethernet monitor >> >> -the fix is certainly a lock somewhere but global or fine grained I dont >> know. >> >> Not all calls need to be locked, eg the one that returns the PHY >> address. Probaby not needed by default, but a PHY access lock would >> prevent any issue you describe. >> >> I will wait for people with more expertise about this. >> >> Just a note, dont forget that not all PHY have an interrupt, the one on >> the nucleo stm32h743zi[2] board does not have one. >> >> Sebastien >> >> Le 26/07/2022 à 11:05, Fotis Panagiotopoulos a écrit : >> > Hello, >> > >> > I have eventually found 2 issues regarding networking in my >> > application. >> > I would like to discuss the first one. >> > >> > >> > My code contains something like this: >> > >> > int sd = socket(AF_INET, SOCK_DGRAM, 0); >> > >> > struct ifreq ifr; >> > memset(&ifr, 0, sizeof(struct ifreq)); >> > strncpy(ifr.ifr_name, CONFIG_NETIF_DEV_NAME, IFNAMSIZ); >> > ifr.ifr_mii_phy_id = CONFIG_STM32_PHYADDR; >> > ifr.ifr_mii_reg_num = MII_LAN8720_SECR; >> > ifr.ifr_mii_val_out = 0; >> > ioctl(sd, SIOCGMIIREG, (unsigned long)&ifr); >> > >> > // Do stuff with ifr.ifr_mii_val_out. >> > >> > close(sd); >> > >> > I realized that this type of ioctl will directly access the hardware, >> > without any locking. >> > That is, if any other task needs to use the PHY in any other way, it >> > will >> > eventually corrupt its register data. >> > >> > >> > Two questions on this: >> > 1. Is there any good reason for this? >> > 2. What is the best way to fix it? Shall I add a driver level lock, or >> > should net_lock() be used in any higher layer? >> > >> > >> > >> > On Tue, Jul 19, 2022 at 10:30 PM Fotis Panagiotopoulos < >> f.j.pa...@gmail.com> >> > wrote: >> > >> >> Hello, >> >> >> >>> We have deployed hundreds of boards with stm32f427 and ethernet, they >> >>> have all been working reliably for months without stopping, we know >> >>> it >> >>> because they critically depend on network functionality and we have >> >>> reports if a card becomes unreachable. None has so far outside of >> >>> dedicated tests. >> >>> So I believe that there is no obvious hard bug in these drivers. >> >> Good to hear that! >> >> Although, I may be using a feature or protocol that you are not. >> >> Of course, I don't believe that NuttX is broken per se, but a minor >> >> bug >> >> may lurk somewhere... >> >> >> >> >> >>> I have seen that when I enable the network debugging features, it >> >>> seems >> >> to >> >>> hit an assertion failure before getting to nsh prompt at startup. >> >>> This >> >> was >> >>> on a quite recent master. I haven't had a chance to diagnose this >> >> further. >> >>> Have you tried enabling these and if so, do they work? >> >> If you refer to CONFIG_DEBUG_NET, then yes I have enabled it and it >> works. >> >> I have some devices under test, waiting to reproduce the issue to see >> >> if >> >> this option provides any useful information. >> >> >> >> >> >>> Also, out of curiosity, have you tried running ostest on your board? >> >> I just tried. >> >> It passed all the tests. >> >> >> >> On Tue, Jul 19, 2022 at 4:44 PM Sebastien Lorquet >> >> <sebast...@lorquet.fr >> > >> >> wrote: >> >> >> >>> Hi, >> >>> >> >>> We have deployed hundreds of boards with stm32f427 and ethernet, they >> >>> have all been working reliably for months without stopping, we know >> >>> it >> >>> because they critically depend on network functionality and we have >> >>> reports if a card becomes unreachable. None has so far outside of >> >>> dedicated tests. >> >>> >> >>> So I believe that there is no obvious hard bug in these drivers. >> >>> >> >>> Most certainly a build option on your particular config. debug is a >> >>> possible issue, thread problems is another possibility. >> >>> >> >>> Sebastien >> >>> >> >>> >> >>> On 7/19/22 13:47, Fotis Panagiotopoulos wrote: >> >>>> Hello! >> >>>> >> >>>> I am using Ethernet on an STM32F427 target, but I am facing some >> issues. >> >>>> >> >>>> Initially the device works correctly. After some hours of continuous >> >>>> operation I completely lose all network communications. >> >>>> Trying to troubleshoot the issue, I enabled assertions and various >> other >> >>>> debug features. >> >>>> >> >>>> Again the device works correctly for some hours, and then I get a >> failed >> >>>> assertion at stm32_eth.c, line 1372: >> >>>> >> >>>> DEBUGASSERT(dev->d_len == 0 && dev->d_buf == NULL); >> >>>> >> >>>> No other errors are reported (e.g. stack overflows etc). >> >>>> >> >>>> >> >>>> I have observed that this issue usually manifests itself when there >> >>>> is >> >>>> insufficient stack on a task. >> >>>> But in my case, all tasks have oversized stacks. Typically they do >> >>>> not >> >>>> exceed 50% utilization. >> >>>> I have plenty of room available in the heap too (> 100kB). >> >>>> >> >>>> Regarding the rest of the firmware, I cannot see any other >> misbehaviour >> >>> or >> >>>> problem. >> >>>> I haven't ever seen any other unexplained problem, assertion fail, >> >>>> hard-fault etc. >> >>>> The application code passes all of our tests. >> >>>> In fact, even when this issue happens, although I lose network >> >>>> connectivity, the rest of the system works perfectly. >> >>>> >> >>>> Please note that I have checked the contents of dev->d_len and >> >>> dev->d_buf, >> >>>> and they seem to contain valid data. >> >>>> The address lies within the normal address space of the MCU, and the >> >>> size >> >>>> is sane. >> >>>> So it doesn't look like any kind of memory corruption. >> >>>> >> >>>> >> >>>> At this point I believe that this is an actual bug either on the >> >>>> STM32 >> >>> MAC >> >>>> driver, or at the TCP/IP stack itself. >> >>>> I had a look at the driver code, but I didn't see anything >> >>>> suspicious. >> >>>> >> >>>> >> >>>> Has anyone observed the same issue before? >> >>>> Can it be affected in any way with my configuration? >> >>>> Or maybe, do you have any recommendations on what to test next? >> >>>> >> >>>> >> >>>> Thank you! >> >>>> >> >