Hi there guys,
I don't know how to approach this problem. Hopefully, you can give me
some tips.
I am working with lwip 1.4.1 and the Texas Instrument HDK RM57 without
an OS.
Sorry for the long email. I need you to get a good overview of my
problem :-)
_How my program works:_
* sensor data comes in through a serial interface in asynchronous mode
* once all data is received, it is sent to a client on a PC via TCP
(the server is the board)
_Problem:_
* after some random time (normally around an hour), the program
crashes with a "data fetch" error
_Things I noticed:_
* The L4, L4_ABT, L4_USR registers point towards a problem with the
serial interface (bad address). I know that whatever is pointed to
by the L4 register doesn't necessarily mean that the problem lies at
that instruction. The L4 register is set at the time the debugger or
board notices the problem, but the time at which the problem occured
could be several instructions before. However, I also notice that
when the problem occurs the buffers (in the double buffer) I use in
the receiving interrupt routine of the serial interface point to an
address outside the allowed region. This buffers are part of my
application, not system level buffers.
* I use another serial interface to send some debug information to my
developing PC (in synchronous mode). I noticed that when I increased
the amount of debug data and the frequency at which it is sent, the
program crashes faster. It normally takes about an hour to crash,
with more debug info it took about 15 minutes or less.
_Tests I did:_
* I left the program running *without *the TCP server for a whole a
day. Increased the amount and frequency of data sent over the debug
serial interface. Sensor data was being received in asynchronous
mode. Program did *NOT *crashed.
* I left the program running with the TCP server for a whole a day
with a single static buffer which was initialized once and never
changed. Serial interface to the sensor was not active. Debug serial
interface was active but sending data once in a while. Program did
*NOT *crashed.
* I left the program running with the TCP server for 4 hours with a
double static buffer which was being updated every 60ms with dummy
values. Serial interface to the sensor was not active. Debug serial
interface was active but sending data once in a while. Program did
*NOT *crashed. (I did this to test if my copy function somehow was
fault).
* I tried running the TCP Server and the sensor serial interface at
the same time but without copying the serial buffer to the tcp
buffer. The TCP server was sending, in one test, the single static
buffer that is never changed, and in the other test, the double
static buffer (being updated every 60ms with dummy values). In
*both *tests, the program *crashed*.
The problem only occurs when both the TCP and the asynchronous serial
communication are active at the same time.
(_Maybe related_) The TCP client on the PC is actually a GUI that
displays my sensor data. When connected, the "image" of the sensor data
"jumps" once every ~2 seconds. The data is wrong. I thought this could
be a copy error from the serial buffer to the tcp buffer. But, before
the board sends the data over TCP, it processes it and checks if the
data is wrong or corrupted. If it is, it sends several error signals
(LEDs and serial debug data). When the image in the GUI jumps, I also
get the error signals from the board (which means, the data is actually
wrong). When the server is not connected to the GUI, I do *NOT *get
those error signals from the board.
This looks as if the TCP server somehow affects the interrupt routine of
the serial interface (or the same memory area?) (and somehow corrupts
its buffers?). So, I saw in my port (which I got from a Texas Instrument
tutorial on LwIP and the board I am using) that the functions below are
called the whole time because of SYS_LIGHTWEIGHT_PROT = 1. I thought
that maybe the serial interface doesn't like it when its interrupt is
being enabled and disabled that fast.
sys_prot_t
sys_arch_protect(void)
{
sys_prot_t status;
status = (IntMasterStatusGet() & 0xFF);
IntMasterIRQDisable();
return status;
}
void
sys_arch_unprotect(sys_prot_t lev)
{
/* Only turn interrupts back on if they were originally on when
the matching
sys_arch_protect() call was made. */
if((lev & 0x80) == 0) {
IntMasterIRQEnable();
}
void IntMasterIRQEnable(void)
{
_enable_IRQ();
return;
}
void IntMasterIRQDisable(void)
{
_disable_IRQ();
return;
}
I then changed that define to SYS_LIGHTWEIGHT_PROT = 0. This functions
were not called again, but the program still crashes.
(_Maybe related_) I have another connection to another part of the GUI.
Both connections work at the same time. The board crashes faster than
with one TCP connection.
(_Unrelated_) I checked both connections with Wireshark. I noticed that
the protocol of connection A (52 bytes data size) is TCP and the
description shows "PSH", "ACK" and similar things. However, the protocol
of connection B (sensor data: 1056 bytes) says ECHO and the description
just shows "response". I use two servers with the same structure. Why
would one have another protocol type? What does that ECHO mean?
By the way, I've had the lwip stats active when the crash occured but it
doesn't show any error.
I don't have much experience in embedded programming. I don't know how
to investigate this deeper. What else can I check? Maybe my port is not
right? Has someone a port to the HDK RM57 that I can compare?
I really appreciate any help.
Best regards,
Julio
_______________________________________________
lwip-users mailing list
[email protected]
https://lists.nongnu.org/mailman/listinfo/lwip-users