[lwip-users] Device crashes while connected via TCP and Serial simultaneously

Julio Cesar Aguilar Zerpa Thu, 26 Jan 2017 02:49:15 -0800

Hi there guys,

I don't know how to approach this problem. Hopefully, you can give mesome tips.

I am working with lwip 1.4.1 and the Texas Instrument HDK RM57 withoutan OS.

Sorry for the long email. I need you to get a good overview of myproblem :-)


_How my program works:_

 * sensor data comes in through a serial interface in asynchronous mode
 * once all data is received, it is sent to a client on a PC via TCP
   (the server is the board)

_Problem:_

 * after some random time (normally around an hour), the program
   crashes with a "data fetch" error

_Things I noticed:_

 * The L4, L4_ABT, L4_USR registers point towards a problem with the
   serial interface (bad address). I know that whatever is pointed to
   by the L4 register doesn't necessarily mean that the problem lies at
   that instruction. The L4 register is set at the time the debugger or
   board notices the problem, but the time at which the problem occured
   could be several instructions before. However, I also notice that
   when the problem occurs the buffers (in the double buffer) I use in
   the receiving interrupt routine of the serial interface point to an
   address outside the allowed region. This buffers are part of my
   application, not system level buffers.
 * I use another serial interface to send some debug information to my
   developing PC (in synchronous mode). I noticed that when I increased
   the amount of debug data and the frequency at which it is sent, the
   program crashes faster. It normally takes about an hour to crash,
   with more debug info it took about 15 minutes or less.

_Tests I did:_

 * I left the program running *without *the TCP server for a whole a
   day. Increased the amount and frequency of data sent over the debug
   serial interface. Sensor data was being received in asynchronous
   mode. Program did *NOT *crashed.
 * I left the program running with the TCP server for a whole a day
   with a single static buffer which was initialized once and never
   changed. Serial interface to the sensor was not active. Debug serial
   interface was active but sending data once in a while. Program did
   *NOT *crashed.
 * I left the program running with the TCP server for 4 hours with a
   double static buffer which was being updated every 60ms with dummy
   values. Serial interface to the sensor was not active. Debug serial
   interface was active but sending data once in a while. Program did
   *NOT *crashed. (I did this to test if my copy function somehow was
   fault).
 * I tried running the TCP Server and the sensor serial interface at
   the same time but without copying the serial buffer to the tcp
   buffer. The TCP server was sending, in one test, the single static
   buffer that is never changed, and in the other test, the double
   static buffer (being updated every 60ms with dummy values).  In
   *both *tests, the program *crashed*.

The problem only occurs when both the TCP and the asynchronous serialcommunication are active at the same time.

(_Maybe related_) The TCP client on the PC is actually a GUI thatdisplays my sensor data. When connected, the "image" of the sensor data"jumps" once every ~2 seconds. The data is wrong. I thought this couldbe a copy error from the serial buffer to the tcp buffer. But, beforethe board sends the data over TCP, it processes it and checks if thedata is wrong or corrupted. If it is, it sends several error signals(LEDs and serial debug data). When the image in the GUI jumps, I alsoget the error signals from the board (which means, the data is actuallywrong). When the server is not connected to the GUI, I do *NOT *getthose error signals from the board.

This looks as if the TCP server somehow affects the interrupt routine ofthe serial interface (or the same memory area?) (and somehow corruptsits buffers?). So, I saw in my port (which I got from a Texas Instrumenttutorial on LwIP and the board I am using) that the functions below arecalled the whole time because of SYS_LIGHTWEIGHT_PROT = 1. I thoughtthat maybe the serial interface doesn't like it when its interrupt isbeing enabled and disabled that fast.


   sys_prot_t
   sys_arch_protect(void)
   {
      sys_prot_t status;
      status = (IntMasterStatusGet() & 0xFF);

      IntMasterIRQDisable();
      return status;
   }
   void
   sys_arch_unprotect(sys_prot_t lev)
   {
      /* Only turn interrupts back on if they were originally on when
   the matching
         sys_arch_protect() call was made. */
      if((lev & 0x80) == 0) {
        IntMasterIRQEnable();
      }

   void IntMasterIRQEnable(void)
   {
        _enable_IRQ();
        return;
   }

   void IntMasterIRQDisable(void)
   {
        _disable_IRQ();
        return;
   }

I then changed that define to SYS_LIGHTWEIGHT_PROT = 0. This functionswere not called again, but the program still crashes.

(_Maybe related_) I have another connection to another part of the GUI.Both connections work at the same time. The board crashes faster thanwith one TCP connection.

(_Unrelated_) I checked both connections with Wireshark. I noticed thatthe protocol of connection A (52 bytes data size) is TCP and thedescription shows "PSH", "ACK" and similar things. However, the protocolof connection B (sensor data: 1056 bytes) says ECHO and the descriptionjust shows "response". I use two servers with the same structure. Whywould one have another protocol type? What does that ECHO mean?

By the way, I've had the lwip stats active when the crash occured but itdoesn't show any error.

I don't have much experience in embedded programming. I don't know howto investigate this deeper. What else can I check? Maybe my port is notright? Has someone a port to the HDK RM57 that I can compare?


I really appreciate any help.

Best regards,

Julio

_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users

[lwip-users] Device crashes while connected via TCP and Serial simultaneously

Reply via email to