Re: [lwip-users] lwip and/or general tcp problems

Paul Butler Mon, 26 Mar 2007 09:05:17 -0800

Thomas, 

Thanks for your response.  I have added additional information below your 
responses.  I sent a response with a first round of logfiles and my header file 
included, but it appears they got stripped out or the email blocked.  Is there 
a way to send attachments to the list, or may I send them to your personal 
address instead of the list?

Paul
  ----- Original Message ----- 
  From: Taranowski, Thomas (SWCOE) 
  To: Mailing list for lwIP users 
  Sent: March 23, 2007 7:39 PM
  Subject: RE: [lwip-users] lwip and/or general tcp problems

  I am working on a data acquisition system using an Analog Devices' Blackfin 
BF537, which has a 100Mb/s MAC and utilizes a port of lwip.  The lwip port 
appears to be derived from STABLE-0_6_3.  My application requires high 
throughput on the ethernet interface (~20Mb/s), so I have been creating very 
simple applications to run on the embedded processor with lwip to test the 
throughput and reliability of the setup.  The sample application on the BF537 
simply creates, binds, and listens on a socket, and then in an infinite loop 
accepts a single connection and then while that connection is open sends large 
packets (1460 bytes) on the connection.  I have a simple LabVIEW application 
that receives the data, and I have also been using the Wireshark analyzer to 
look at the transfers.  In this configuration, I am experiencing the following 
that I would really appreciate some insight on:

  1) When lwip is configured to use DHCP, it is very difficult to maintain a 
high throughput.  In fact, the connection very frequently times out after 
transferring just a few packets.  I don't see much other traffic related to 
having the DHCP server on the LAN, and I use a switch to isolate the 
transmitting device and the receiving PC.

  [TT] This could be a function of the configuration of your DHCP server, and 
the length of lease that is granted during the initial dhcp negotiation.

  I will confirm this.  I have attached a log file showing a case that timed 
out after a few transfers (070323 DHCP Startup Failed, some data.pcap), and one 
that failed with no data transferred (070323 DHCP Startup Failed.pcap).

  2) When not using DHCP, in general the connection is more reliable.  However, 
there appears to be a "cold start" issue, where when the devices on the LAN 
(transmitter, switch, and receiving PC) are powered on for the first time the 
connection has trouble establishing itself.  A few packets will transfer 
successfully, followed by a dropped packet with no successful retransmissions 
over 30 seconds.

  [TT] This is pretty hard to diagnose.  To my mind, it sounds like it could be 
problems with the way in which the application design at system startup.  To 
diagnose this more closely, sniffer logs would be needed.

  I have attached a log file showing this failure (070323 Startup 
Failure.pcap).  Do you have a recommendation for the way the system should 
startup? Is a delay between accepting the connection and transmitting data 
likely to improve this issue?  There is already a considerable delay between 
when I power the switch and when I make the connection.

  3) Again without DHCP, I can observe stalls in the transmitted data stream.  
Normally, packets are transmitted more than once a millisecond (up to 8 or ten 
per millisecond), but occasionally there are periods of ~150ms where no data is 
transmitted.  The receive window has not closed, and there is not indication of 
dropped packets or retransmission in the log file.

  [TT] It could be that the transmit window (assuming TCP) is full.  It could 
also be something to do with the multitude of #defines that tune the 
performance/space in opt.h.  Some sniffer logs may shed some light on the 
issue.  What window size does the remote end advertise?

  The remote end advertises a 64k window size.  I wasn't clear on a lot of the 
#defines - I've attached my option header file, could you comment? Is there 
somewhere I have limited my transmit window to just a few segments?

  4) Still without DHCP, I observe ~2s stalls.  These appear to be caused by >1 
dropped packet, which results in the first dropped packet being resent by fast 
retransmission, and all other packets being resent by the retransmission timers.

  [TT] This sounds like half-duplex Ethernet operation to me.  Make sure you 
don't have any half-duplex hubs floating around on your network.  These will 
cause random wait times on the order you mentioned.

  I confirmed that the 3 devices comprising my LAN (embedded device, hp switch, 
and ibm laptop) are all at least 10/100 auto negotiate half/full duplex, and 
the ibm laptop is a 1Gb device.  Other than forcing the devices to 100Mb Full 
duplex, is there a way to confirm that nobody is operating at half duplex?  Can 
you clarify why a half-duplex hub would cause random waits?

  Can anyone confirm that any or all of these behaviors is unexpected in a LAN 
environment (RTT normally <1ms)?  Although I'm new to this, it seems surprising 
that my little LAN with <15' CAT5 cable segments is so likely to have corrupted 
or lost packets.  

  [TT] An old hub or faulty connector can cause all sorts of issues.  I'd 
revert back to as simple a network as possible, and proceed from there, adding 
segments until some bad behavior is exhibited.

  I can try this with just a crossover cable, but there's not much room to go 
simpler.  For the DHCP problems, can you recommend a simple way to add a DHCP 
server without connecting into my full office network?

  Can anyone give me some guidance on what to expect regarding lost packets?  

  [TT] An analysis I did some time back for an avionics platform concluded that 
I could expect that the phy, at a minimum, would cause one lost/corrupt packet 
per 24 hour period on a 3 in. long peer to peer link.  It seems to me that a 
dozen a day on a small network would not be unusual.

  A dozen a day doesn't sound unreasonable.  I'm currently able to generate 
what I assume are lost/corrupt packets within a 20 or 30 second log file.

  Are the recovery processes I've observed correct behavior?  Should only a 
single packet be resent usign fast retransmission?  Is there anything inherent 
in the stack that could cause brief pauses in the data stream?  Why does using 
DHCP apparently make it so difficult to establish and maintain a 
high-throughput connection, particularly since there doesn't seem to be any 
other traffic on the LAN?

  Apologies for the multiple questions, but I needed to start somewhere, and 
I've already reached the limit of what the Analog Devices' support engineers 
can help with.  I can provide the log files from Wireshark if that would be 
helpful, but some are very large (tens of megabytes).  I'd also be interested 
if anyone can suggest other resources to further my understanding of networking 
and TCP/IP issues.

  [TT] You'd start by locating the portions of the capture logs that show 
aberrant behavior. 

  I'll follow up with those logfiles shortly.  Is there an easier way to cut 
them down to size than using the editcap command-line utility? 

  Thanks,

  Paul Butler

------------------------------------------------------------------------------

  _______________________________________________
  lwip-users mailing list
  [email protected]
  http://lists.nongnu.org/mailman/listinfo/lwip-users
Development Engineer
Vtech Engineering Corporation
978-974-9944

_______________________________________________
lwip-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/lwip-users

Re: [lwip-users] lwip and/or general tcp problems

Reply via email to