Hello,

still trying to make the network work reliably.
After fixing another issue of my application, I hit another problem.

The following sequence causes NuttX to crash:

1. My application is creating a TCP socket and communicates with a server.
2. At one point the server stops responding (unrelated to NuttX / network
issue).
3. The application detects the timeout, and calls close() on the socket.
4. A new socket is created, and it is connected to the server.
5. At this point, the server decides to send a FIN message for the previous
connection.
6. I get a failed assertion in devif_callback.c at line 85.

Note that I haven't managed to manually reproduce this issue.
No matter what I do manually, everything seems to be working correctly.
I just have to wait for it to happen.
It seems that it is only triggered if a FIN arrives **after** a SYN.

I am sure that this is only happening with CONFIG_NET_TCP_WRITE_BUFFERS
enabled.
I have no problems without buffering.

The assertion seems right to fire.
When a FIN is received for a closed connection, the same callback is free'd
both by tcp_lost_connection() and later on by tcp_close_eventhandler().
All these are happening within the same execution of tcp_input().

Any ideas?



















On Tue, Jul 26, 2022 at 3:44 PM Sebastien Lorquet <sebast...@lorquet.fr>
wrote:

> Hi,
>
> good find but
>
> -I dont think any usual application tinkers with PHY regs during its
> lifetime except the ethernet monitor
>
> -the fix is certainly a lock somewhere but global or fine grained I dont
> know.
>
> Not all calls need to be locked, eg the one that returns the PHY
> address. Probaby not needed by default, but a PHY access lock would
> prevent any issue you describe.
>
> I will wait for people with more expertise about this.
>
> Just a note, dont forget that not all PHY have an interrupt, the one on
> the nucleo stm32h743zi[2] board does not have one.
>
> Sebastien
>
> Le 26/07/2022 à 11:05, Fotis Panagiotopoulos a écrit :
> > Hello,
> >
> > I have eventually found 2 issues regarding networking in my application.
> > I would like to discuss the first one.
> >
> >
> > My code contains something like this:
> >
> > int sd = socket(AF_INET, SOCK_DGRAM, 0);
> >
> > struct ifreq ifr;
> > memset(&ifr, 0, sizeof(struct ifreq));
> > strncpy(ifr.ifr_name, CONFIG_NETIF_DEV_NAME, IFNAMSIZ);
> > ifr.ifr_mii_phy_id = CONFIG_STM32_PHYADDR;
> > ifr.ifr_mii_reg_num = MII_LAN8720_SECR;
> > ifr.ifr_mii_val_out = 0;
> > ioctl(sd, SIOCGMIIREG, (unsigned long)&ifr);
> >
> > // Do stuff with ifr.ifr_mii_val_out.
> >
> > close(sd);
> >
> > I realized that this type of ioctl will directly access the hardware,
> > without any locking.
> > That is, if any other task needs to use the PHY in any other way, it will
> > eventually corrupt its register data.
> >
> >
> > Two questions on this:
> > 1. Is there any good reason for this?
> > 2. What is the best way to fix it? Shall I add a driver level lock, or
> > should net_lock() be used in any higher layer?
> >
> >
> >
> > On Tue, Jul 19, 2022 at 10:30 PM Fotis Panagiotopoulos <
> f.j.pa...@gmail.com>
> > wrote:
> >
> >> Hello,
> >>
> >>> We have deployed hundreds of boards with stm32f427 and ethernet, they
> >>> have all been working reliably for months without stopping, we know it
> >>> because they critically depend on network functionality and we have
> >>> reports if a card becomes unreachable. None has so far outside of
> >>> dedicated tests.
> >>> So I believe that there is no obvious hard bug in these drivers.
> >> Good to hear that!
> >> Although, I may be using a feature or protocol that you are not.
> >> Of course, I don't believe that NuttX is broken per se, but a minor bug
> >> may lurk somewhere...
> >>
> >>
> >>> I have seen that when I enable the network debugging features, it seems
> >> to
> >>> hit an assertion failure before getting to nsh prompt at startup. This
> >> was
> >>> on a quite recent master. I haven't had a chance to diagnose this
> >> further.
> >>> Have you tried enabling these and if so, do they work?
> >> If you refer to CONFIG_DEBUG_NET, then yes I have enabled it and it
> works.
> >> I have some devices under test, waiting to reproduce the issue to see if
> >> this option provides any useful information.
> >>
> >>
> >>> Also, out of curiosity, have you tried running ostest on your board?
> >> I just tried.
> >> It passed all the tests.
> >>
> >> On Tue, Jul 19, 2022 at 4:44 PM Sebastien Lorquet <sebast...@lorquet.fr
> >
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> We have deployed hundreds of boards with stm32f427 and ethernet, they
> >>> have all been working reliably for months without stopping, we know it
> >>> because they critically depend on network functionality and we have
> >>> reports if a card becomes unreachable. None has so far outside of
> >>> dedicated tests.
> >>>
> >>> So I believe that there is no obvious hard bug in these drivers.
> >>>
> >>> Most certainly a build option on your particular config. debug is a
> >>> possible issue, thread problems is another possibility.
> >>>
> >>> Sebastien
> >>>
> >>>
> >>> On 7/19/22 13:47, Fotis Panagiotopoulos wrote:
> >>>> Hello!
> >>>>
> >>>> I am using Ethernet on an STM32F427 target, but I am facing some
> issues.
> >>>>
> >>>> Initially the device works correctly. After some hours of continuous
> >>>> operation I completely lose all network communications.
> >>>> Trying to troubleshoot the issue, I enabled assertions and various
> other
> >>>> debug features.
> >>>>
> >>>> Again the device works correctly for some hours, and then I get a
> failed
> >>>> assertion at stm32_eth.c, line 1372:
> >>>>
> >>>> DEBUGASSERT(dev->d_len == 0 && dev->d_buf == NULL);
> >>>>
> >>>> No other errors are reported (e.g. stack overflows etc).
> >>>>
> >>>>
> >>>> I have observed that this issue usually manifests itself when there is
> >>>> insufficient stack on a task.
> >>>> But in my case, all tasks have oversized stacks. Typically they do not
> >>>> exceed 50% utilization.
> >>>> I have plenty of room available in the heap too (> 100kB).
> >>>>
> >>>> Regarding the rest of the firmware, I cannot see any other
> misbehaviour
> >>> or
> >>>> problem.
> >>>> I haven't ever seen any other unexplained problem, assertion fail,
> >>>> hard-fault etc.
> >>>> The application code passes all of our tests.
> >>>> In fact, even when this issue happens, although I lose network
> >>>> connectivity, the rest of the system works perfectly.
> >>>>
> >>>> Please note that I have checked the contents of dev->d_len and
> >>> dev->d_buf,
> >>>> and they seem to contain valid data.
> >>>> The address lies within the normal address space of the MCU, and the
> >>> size
> >>>> is sane.
> >>>> So it doesn't look like any kind of memory corruption.
> >>>>
> >>>>
> >>>> At this point I believe that this is an actual bug either on the STM32
> >>> MAC
> >>>> driver, or at the TCP/IP stack itself.
> >>>> I had a look at the driver code, but I didn't see anything suspicious.
> >>>>
> >>>>
> >>>> Has anyone observed the same issue before?
> >>>> Can it be affected in any way with my configuration?
> >>>> Or maybe, do you have any recommendations on what to test next?
> >>>>
> >>>>
> >>>> Thank you!
> >>>>
>

Reply via email to