Hi, Finally I was able to identify the root cause of the problem. Thank you Alan! I did an experiment and ran the same app on my SAMv7 based board (that has the original issue) and ESP32-S3, and... the issue was not reproduced on ESP32-S3. So I dug into the SAMv7 Ethernet driver and found an issue with GMAC registers modification and D-Cache write-through mode. The case is that sometimes the packet was put to the GMAC TX buffer from the TCP stack, but not actually transmitted because TXGO was set to register while the corresponding TX descriptor was still not flushed to memory. I ended up with a generic cache fix https://github.com/apache/nuttx/pull/10536 that I think should give a positive influence on all chips based on armv7-m and armv8-m.
Best regards, Petro чт, 7 вер. 2023 р. о 13:32 Alan C. Assis <acas...@gmail.com> пише: > Hi Petro, > > I don't remember seeing those Retransmits when I was debugging the > issue with MQTT, see attached image. > > I saved all the wireshark dumps, I can share it with you case you want > to compare. > > BR, > > Alan > > On 9/7/23, Petro Karashchenko <petro.karashche...@gmail.com> wrote: > > Hi, > > > > I got back to investigating the issue and based on a wireshark logs quite > > often the SYN message from PC to device is not getting acknowledged and > > getting retransmitted after 1s. > > > > The communication log looks like > > [image: Screenshot 2023-09-07 at 11.10.33.png] > > > > And at the start there is an example of "good" transaction, or I would > even > > say a semi-good because I can't explain the "[TCP Retransmission] 59996 - > > 80 [FIN, ACK] Seq=101 Ack=374 Win=65535 Len=0" from PC to device. > > > > Right after the new transaction is started and time diff between "59997 - > > 80 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=64 TSval=4107001238 TSecr=0 > > SACK_PERM" and "[TCP Retransmission] 59997 - 80 [SYN] Seq=0 Win=65535 > Len=0 > > MSS=1460 WS=64 TSval=4107002239 TSecr=0 SACK PERM" is exactly 1s. > > > > Has anybody met such a situation? Is it depending on TCP stack > > configuration options or maybe I should look for a bug in ethernet > driver, > > something like buffer overrun or similar? > > > > I'm using a SAMv7 based board and my code trunk is at least 6month behind > > the latest master. > > > > Best regards, > > Petro > > > > пт, 1 вер. 2023 р. о 01:09 Petro Karashchenko > > <petro.karashche...@gmail.com> > > пише: > > > >> The delayed ACK option is selected. I will try to deselect it and make > >> end > >> to end testing before going further with investigation. > >> Thank you very much for the hint. > >> > >> пт, 1 вер. 2023 р. о 01:04 Gregory Nutt <spudan...@gmail.com> пише: > >> > >>> > >>> On 8/31/2023 3:39 PM, Petro Karashchenko wrote: > >>> > Hello, > >>> > > >>> > I'm having an issue with a network based application on NuttX. > >>> > I have a HTTP server that is built with the help of the > >>> > "netlib_server" > >>> > interface. When I'm trying to access my server with curl multiple > >>> > times > >>> in > >>> > a row I see that there is a gap close to 500ms on the PC side when > >>> > accessing the device. > >>> > > >>> > I think maybe this is somehow related to a case where each time a new > >>> > request is accepted the client socket is created, handled and closed? > >>> > I > >>> > will deepdive into this of course and will analyze wireshark logs, > but > >>> > maybe someone has any guesses or met similar issues in the past? I > >>> > mean > >>> > maybe there is some kind of "blacklist" or TCP/socket configuration > >>> > that > >>> > prevents new connections from being established for a certain period > >>> > of > >>> > time? > >>> > > >>> > Best regards, > >>> > Petro > >>> > >>> You will have to use wireshark to get to the bottom. It sounds like > >>> there is some delay or timeout that is causing the issue. Perhaps in > the > >>> 3-way handshake. It taks 500ms for a missing ACK to be detected. Are > >>> you using delayed ACKs? That delay is 500ms too. > >>> > >>> > >>> > > >