Hi,

we run lwip-1.4.0 on PPC440 and experience rare random hanging of TCP. I
was able to create a minimal working example to reproduce the hang: Setup a
tcp server on the PPC:

    int socketId =  socket(AF_INET, SOCK_STREAM, 0);
    if(socketId == -1){...return;}


    struct sockaddr_in server;
    server.sin_family = AF_INET;
    server.sin_port = htons(12121);
    server.sin_addr.s_addr = INADDR_ANY;

    int err = bind(socketId, (struct sockaddr *) &server, sizeof (server));
    if (err < 0) {... return;}

    err = listen(socketId, 1);
    if (err < 0) { .... return;}

   while(1) {

        int socketConn = accept(socketId, NULL, NULL);
        sys_thread_t thread = sys_thread_new("tcip_server",
processConnection,
                                             (void*)socketConn,
                                             2*THREAD_STACKSIZE,
                                             DEFAULT_THREAD_PRIO);
    }

sponing a thread on accepted connection:

void processConnection(void *p) {

    int sd = (int)p;
    uint8_t *buffer = new uint8_t[CMD_MAX_SIZE];
    uint32_t n = 0, bufOffset = 0;

    while((n  = read(sd, buffer+bufOffset, CMD_MAX_SIZE-bufOffset)) > 0 ) {
        bufOffset += n;
    }
    ......
    if(buffer) delete buffer;

    close(sd);
}

Then keep dumping the content of a file (~30 characters)
for i in {1..n}; do cat some_file > /dev/tcp/DEVICE_IP/12121; done
from 2 shells at the same time. For some time I see random

cat: write error: Connection reset by peer

However after some time, this message is printed after every command of any
of the two threads. At this point the tcp breaks. I admit that this example
is rather agressive, but allows me to get the system to a similar
problematic state that we experience in production.


I found that after the TCP breaks down, UDP communication still works and I
am able to check the state of the system. For example, lwip_stats.tcp
counts properly incoming TCP packets. One cannot however create new tcp
socket anymore. I don't see TCPIP_MSG_API messages in tcpip_thread anymore,
etc. Placing printouts/usleep(1000) inside in some places removes the (race
condition) problem, but also slows the system.

Any advice on how to move forward in debugging this would be very much
appreciated. opt.h and tcp_impl.h attached. I tried to play blindly with a
few paramters (TCP_TMR_INTERVAL, TCP_SLOW_INTERVAL, MEM_ALIGNMENT,
MEMP_OVERFLOW_CHECK, MEMP_NUM_TCP_PCB, MEMP_NUM_TCP_PCB_LISTEN) with no
success.

Best,
Oldrich

Attachment: tcp_impl.h
Description: Binary data

Attachment: opt.h
Description: Binary data

_______________________________________________
lwip-users mailing list
lwip-users@nongnu.org
https://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to