Hi Fotis,

Are you in sync with mainline?

If you can create a host application to induce the issue will be
easier for us to test.

BR,

Alan

On 8/9/22, Fotis Panagiotopoulos <f.j.pa...@gmail.com> wrote:
> Hello,
>
> still trying to make the network work reliably.
> After fixing another issue of my application, I hit another problem.
>
> The following sequence causes NuttX to crash:
>
> 1. My application is creating a TCP socket and communicates with a server.
> 2. At one point the server stops responding (unrelated to NuttX / network
> issue).
> 3. The application detects the timeout, and calls close() on the socket.
> 4. A new socket is created, and it is connected to the server.
> 5. At this point, the server decides to send a FIN message for the previous
> connection.
> 6. I get a failed assertion in devif_callback.c at line 85.
>
> Note that I haven't managed to manually reproduce this issue.
> No matter what I do manually, everything seems to be working correctly.
> I just have to wait for it to happen.
> It seems that it is only triggered if a FIN arrives **after** a SYN.
>
> I am sure that this is only happening with CONFIG_NET_TCP_WRITE_BUFFERS
> enabled.
> I have no problems without buffering.
>
> The assertion seems right to fire.
> When a FIN is received for a closed connection, the same callback is free'd
> both by tcp_lost_connection() and later on by tcp_close_eventhandler().
> All these are happening within the same execution of tcp_input().
>
> Any ideas?
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Jul 26, 2022 at 3:44 PM Sebastien Lorquet <sebast...@lorquet.fr>
> wrote:
>
>> Hi,
>>
>> good find but
>>
>> -I dont think any usual application tinkers with PHY regs during its
>> lifetime except the ethernet monitor
>>
>> -the fix is certainly a lock somewhere but global or fine grained I dont
>> know.
>>
>> Not all calls need to be locked, eg the one that returns the PHY
>> address. Probaby not needed by default, but a PHY access lock would
>> prevent any issue you describe.
>>
>> I will wait for people with more expertise about this.
>>
>> Just a note, dont forget that not all PHY have an interrupt, the one on
>> the nucleo stm32h743zi[2] board does not have one.
>>
>> Sebastien
>>
>> Le 26/07/2022 à 11:05, Fotis Panagiotopoulos a écrit :
>> > Hello,
>> >
>> > I have eventually found 2 issues regarding networking in my
>> > application.
>> > I would like to discuss the first one.
>> >
>> >
>> > My code contains something like this:
>> >
>> > int sd = socket(AF_INET, SOCK_DGRAM, 0);
>> >
>> > struct ifreq ifr;
>> > memset(&ifr, 0, sizeof(struct ifreq));
>> > strncpy(ifr.ifr_name, CONFIG_NETIF_DEV_NAME, IFNAMSIZ);
>> > ifr.ifr_mii_phy_id = CONFIG_STM32_PHYADDR;
>> > ifr.ifr_mii_reg_num = MII_LAN8720_SECR;
>> > ifr.ifr_mii_val_out = 0;
>> > ioctl(sd, SIOCGMIIREG, (unsigned long)&ifr);
>> >
>> > // Do stuff with ifr.ifr_mii_val_out.
>> >
>> > close(sd);
>> >
>> > I realized that this type of ioctl will directly access the hardware,
>> > without any locking.
>> > That is, if any other task needs to use the PHY in any other way, it
>> > will
>> > eventually corrupt its register data.
>> >
>> >
>> > Two questions on this:
>> > 1. Is there any good reason for this?
>> > 2. What is the best way to fix it? Shall I add a driver level lock, or
>> > should net_lock() be used in any higher layer?
>> >
>> >
>> >
>> > On Tue, Jul 19, 2022 at 10:30 PM Fotis Panagiotopoulos <
>> f.j.pa...@gmail.com>
>> > wrote:
>> >
>> >> Hello,
>> >>
>> >>> We have deployed hundreds of boards with stm32f427 and ethernet, they
>> >>> have all been working reliably for months without stopping, we know
>> >>> it
>> >>> because they critically depend on network functionality and we have
>> >>> reports if a card becomes unreachable. None has so far outside of
>> >>> dedicated tests.
>> >>> So I believe that there is no obvious hard bug in these drivers.
>> >> Good to hear that!
>> >> Although, I may be using a feature or protocol that you are not.
>> >> Of course, I don't believe that NuttX is broken per se, but a minor
>> >> bug
>> >> may lurk somewhere...
>> >>
>> >>
>> >>> I have seen that when I enable the network debugging features, it
>> >>> seems
>> >> to
>> >>> hit an assertion failure before getting to nsh prompt at startup.
>> >>> This
>> >> was
>> >>> on a quite recent master. I haven't had a chance to diagnose this
>> >> further.
>> >>> Have you tried enabling these and if so, do they work?
>> >> If you refer to CONFIG_DEBUG_NET, then yes I have enabled it and it
>> works.
>> >> I have some devices under test, waiting to reproduce the issue to see
>> >> if
>> >> this option provides any useful information.
>> >>
>> >>
>> >>> Also, out of curiosity, have you tried running ostest on your board?
>> >> I just tried.
>> >> It passed all the tests.
>> >>
>> >> On Tue, Jul 19, 2022 at 4:44 PM Sebastien Lorquet
>> >> <sebast...@lorquet.fr
>> >
>> >> wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> We have deployed hundreds of boards with stm32f427 and ethernet, they
>> >>> have all been working reliably for months without stopping, we know
>> >>> it
>> >>> because they critically depend on network functionality and we have
>> >>> reports if a card becomes unreachable. None has so far outside of
>> >>> dedicated tests.
>> >>>
>> >>> So I believe that there is no obvious hard bug in these drivers.
>> >>>
>> >>> Most certainly a build option on your particular config. debug is a
>> >>> possible issue, thread problems is another possibility.
>> >>>
>> >>> Sebastien
>> >>>
>> >>>
>> >>> On 7/19/22 13:47, Fotis Panagiotopoulos wrote:
>> >>>> Hello!
>> >>>>
>> >>>> I am using Ethernet on an STM32F427 target, but I am facing some
>> issues.
>> >>>>
>> >>>> Initially the device works correctly. After some hours of continuous
>> >>>> operation I completely lose all network communications.
>> >>>> Trying to troubleshoot the issue, I enabled assertions and various
>> other
>> >>>> debug features.
>> >>>>
>> >>>> Again the device works correctly for some hours, and then I get a
>> failed
>> >>>> assertion at stm32_eth.c, line 1372:
>> >>>>
>> >>>> DEBUGASSERT(dev->d_len == 0 && dev->d_buf == NULL);
>> >>>>
>> >>>> No other errors are reported (e.g. stack overflows etc).
>> >>>>
>> >>>>
>> >>>> I have observed that this issue usually manifests itself when there
>> >>>> is
>> >>>> insufficient stack on a task.
>> >>>> But in my case, all tasks have oversized stacks. Typically they do
>> >>>> not
>> >>>> exceed 50% utilization.
>> >>>> I have plenty of room available in the heap too (> 100kB).
>> >>>>
>> >>>> Regarding the rest of the firmware, I cannot see any other
>> misbehaviour
>> >>> or
>> >>>> problem.
>> >>>> I haven't ever seen any other unexplained problem, assertion fail,
>> >>>> hard-fault etc.
>> >>>> The application code passes all of our tests.
>> >>>> In fact, even when this issue happens, although I lose network
>> >>>> connectivity, the rest of the system works perfectly.
>> >>>>
>> >>>> Please note that I have checked the contents of dev->d_len and
>> >>> dev->d_buf,
>> >>>> and they seem to contain valid data.
>> >>>> The address lies within the normal address space of the MCU, and the
>> >>> size
>> >>>> is sane.
>> >>>> So it doesn't look like any kind of memory corruption.
>> >>>>
>> >>>>
>> >>>> At this point I believe that this is an actual bug either on the
>> >>>> STM32
>> >>> MAC
>> >>>> driver, or at the TCP/IP stack itself.
>> >>>> I had a look at the driver code, but I didn't see anything
>> >>>> suspicious.
>> >>>>
>> >>>>
>> >>>> Has anyone observed the same issue before?
>> >>>> Can it be affected in any way with my configuration?
>> >>>> Or maybe, do you have any recommendations on what to test next?
>> >>>>
>> >>>>
>> >>>> Thank you!
>> >>>>
>>
>

Reply via email to