Re: [lwip-users] lwip lock

Martin Persich Mon, 18 Apr 2011 12:06:07 -0700

RE: [lwip-users] lwip lockHi Davide,
I see very important information in your message today: "AVR32" !
There is no problem in LwIP, but in Atmel's port file and Atmel's MACB driver, 
I think. (many thans to Kieran for stable version of LwIP ...)
I work with the AVR32 too and there was (is ?) many and many bugs in Atmel's 
MACB driver and Atmel's port files for LwIP !!
I haven't time to study your problem in this moment, but it is look like my 
problems one, two years ago.
You can look to my messages in :
http://lists.nongnu.org/archive/html/lwip-users/2010-04/msg00038.html
http://lists.nongnu.org/archive/html/lwip-users/2010-06/msg00053.html
...
I had problem with reconnection of Ethernet cable too ...  :-(
I can send you to private address my port files for LwIP 1.4.0 (I advise 
upgrade to 1.4.0) and my working version of MACB driver

Martin Persich

  ----- Original Message -----
  From: Tazzari Davide
  To: Mailing list for lwIP users
  Sent: Monday, April 18, 2011 5:18 PM
  Subject: Re: [lwip-users] lwip lock

  I agree with you Kieran, but the problem is that I don't know where to look 
for.

  I used the lwIP 1.3.2 port for avr32 and I didn't touch almost anything.

  In one my long (and boring) previous post I have added the description of all 
the tasks that uses the lwIP with netconn api. I can reply it if you wish.

  About other... I have looked for some timers and I have seen that in the lwip 
core there are a lot of them that I suppose correct. I said "I suppose" because 
I don't really know how to investigate.

  Can you please suggest where to look for?

  Test 1.
  I have connected the device to my computer with a cross Ethernet cable so 
that I haven't any wireless, switch, ... in the middle.

  The situation is pretty the same except the fact that the lock is harder to 
create. After a lot of F5 reload, everything locks while, in the normal 
situation, I need only 5-10 fast reload.

  This could suggest the heavy traffic managed by the lwIP itself could 
interfere with the normal management. I don't know if it is really a timer; 
probably something related to the MAC itself but, as you said, at interrupt 
level. But I don't know where

  What I have seen in this test is that the key is really the TCP_SEG: when 
there is at least an empty block there could be communication even if the lfree 
ram pointer is not in the top of the area, otherwise there is the lock.

  About SYS_TIMEOUT: Everytime I ask a page (or at least a connection) a 
timeout is created. I have set 6 SYS_TIMEOUT. If I reload the page 5 times and 
wait, no error occurs. If 6 or more, the error counter is increased. This seems 
to have no relationship with the TCP_SEG. Anyway, after a lot of error, the 
lwIP continues to function. So, let's forget it for the moment.

  Test 2:

  I have put a Relais toggle in the web server task

  WebServer task
  ...
      for (;;)
      {
          iRestartBinding = 0;
          pxHTTPListener = netconn_new( NETCONN_TCP );
          netconn_bind(pxHTTPListener, NULL, webHTTP_PORT );
          netconn_listen( pxHTTPListener );
          int iTimeout = 1000;

          //for( ; (iRestartBinding < 10) && (gucRestartWebServer == FALSE); 
iRestartBinding++)
          for( ; ; ) // <<-- for this test purpose; In the real case the above 
line is present
          {
              REL_TGL; // <<-- for this test purpose
              xLastFocusTime = xTaskGetTickCount();
              vTaskDelayUntil( &xLastFocusTime, xDelayLength );
              if (iGlobalWtdBomb == FALSE) // TRUE I am waiting for a WDT 
suicide
              {
                  // Wait for a first connection.
                  #if LWIP_SO_RCVTIMEO
                  pxHTTPListener->recv_timeout = iTimeout;
                  #endif

                  pxNewConnection = netconn_accept(pxHTTPListener);
                  if (xTaskCreate(WebServerAnswerTask,
                          ( signed portCHAR * ) "WebServerAnswer",
                          WEB_SERVER_STACK_SIZE,
                          pxNewConnection,
                          ethWEBSERVER_PRIORITY,
                          ( xTaskHandle * ) NULL ) != pdPASS)
                  {
                     // Task not correctly created!!!
                     netconn_write( pxNewConnection, (char *) 
webHTTP_HTM_INTERNAL_ERROR, (u16_t) strlen( webHTTP_HTM_INTERNAL_ERROR ), 
NETCONN_COPY ); // error HTTP 500

                     netconn_close( pxNewConnection );
                     netconn_delete( pxNewConnection );
                  }
                  iRestartBinding = 0;
                  iTimeout = 5000;
              }
          }   // end acquisition loop
          gucRestartWebServer = FALSE;
          netconn_close(pxHTTPListener);
          while(netconn_delete(pxHTTPListener) != 0)
          {
              vTaskDelay(20);
          }
          pxHTTPListener = NULL;
      }
  ...

  Result...
  When I reload the page slowly everything is ok almost forever.
  When I reload the page faster I see that both firefox and explorer process 
the TCP connection, the GET request and immediately after they send [RST, ACK] 
to close the connection except the last one that waits for the device answer. I 
suppose that, due to the fact the browser hasn't received any answer and the 
user requests a reload they would like only the last one to be processed.

  Every netconn_accept (time out or not) I can hear the relais toggle. If I 
press F5 5 times I hear 5 toggle. That's what I expect.

  Sometimes one toggle misses (5 press of F5, 4 toggle!). Exactly in this case, 
I lose a TCP_SEG block and a portion of mem area.

  1 toggle lost means also that the netconn_accept doesn't recognize the 
connection and, from web server task point of view, I cannot see the problem.

  Again, this happens if there are lots of requests (connection, GET, [RST,ACK] 
from browser, close connection) before a (connection, GET, answer, [RST,ACK], 
close connection).

  Sometimes I have seen this transaction in the middle of a reload
  (Firefox) Connection [SYN]
  (device) Connection [SYN, ACK]
  (Firefox) Connection [ACK]
  (Firefox) GET request
  (Firefox) [TCP Retransmission] of the GET request
  (device) [ACK] of the HTTP
  (Firefox) [RST, ACK]   without any answer form the device

  It seems that this is one case of TCP_SEC lost. It is not easy to say because 
I don't know exactly when the loss happens and how I can relate it with the 
wireshark sniffing.

  It seems also that the loss often (but not always) happens when a [TCP 
Retransmission] is present
  Anyway, it seems there is something in the inner management of the [RST,ACK], 
the retransmission or something like that is probably not related to the code I 
have written.

  How can I handle this? Where do I have to look for? I have no idea at the 
moment.
  My milestone is that the lwIP port is correct but at this point I am not so 
sure. I still hope that I wrote the wrong piece of code but, as I have said, I 
have no idea where to look at.

  I hope my new analysis can help

  Best regards
  Davide

------------------------------------------------------------------------------

  _______________________________________________
  lwip-users mailing list
  [email protected]
  http://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] lwip lock

Reply via email to