________________________________

From: Paul Butler [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, April 03, 2007 12:04 PM
To: Taranowski, Thomas (SWCOE)
Subject: Re: [lwip-users] lwip and/or general tcp problems

 

Thomas,

 

Thanks for allowing me to use your personal email address.  I've
attached a log file I've also sent to Kieran, but he has yet to respond
to it.  My initial problems appear to have been tracked to two sources -
The first is a problem with the nagle algorithm implementation from
1.1.0, and the second is a problem where my transmitting app (lwip on an
Analog Devices' DSP) changes the MAC address without changing the IP
address.  The first problem ADI has already identified the cause and
provided a fix for it.  The second problem they have not identified the
cause yet.

 

If you compare segments 49 and 50 in the wireshark (ethereal) log I've
attached, you can see that although the destination IP is still
192.168.16.36 (correct), the MAC address changes from 00:06:1b:c5:d5:06
(correct) to 00:10:24:28:d5:06 (incorrect).  Presumably, the receiving
MAC looks at the MAC address and discards the segment.  In this case,
the transmitter retransmits at segment 55 using the correct MAC address
and the system recovers.  Later in the same file, segments 133 and 134
are sent to the same incorrect MAC address, but the retransmissions at
segments 151, 153, 155, and 157 are sent to a new incorrect MAC address
(00:11:43:ea:d5:06).  

 

If you could comment on this problem of the MAC address getting
corrupted momentarily, it would be a big help.  Given that several
identical segments are sent an it is a many bit error, I don't think the
problem is introduced on the wire.  Please let me know if the raw pcap
file is stripped out, and I will resend in a password protected zip.

 

Thanks again,

Paul

        ----- Original Message ----- 

        From: Taranowski, Thomas (SWCOE)
<mailto:[EMAIL PROTECTED]>  

        To: Mailing list for lwIP users <mailto:[email protected]>  

        Sent: April 3, 2007 2:30 PM

        Subject: RE: [lwip-users] lwip and/or general tcp problems

         

        You can send them to my personal address, but any zip files need
to have a password, otherwise they get stripped out by the firewall.  

         

        
________________________________


        From:
[EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
On Behalf Of Paul Butler
        Sent: Monday, March 26, 2007 11:01 AM
        To: Mailing list for lwIP users
        Subject: Re: [lwip-users] lwip and/or general tcp problems

         

        Thomas, 

         

        Thanks for your response.  I have added additional information
below your responses.  I sent a response with a first round of logfiles
and my header file included, but it appears they got stripped out or the
email blocked.  Is there a way to send attachments to the list, or may I
send them to your personal address instead of the list?

         

        Paul

                ----- Original Message ----- 

                From: Taranowski, Thomas (SWCOE)
<mailto:[EMAIL PROTECTED]>  

                To: Mailing list for lwIP users
<mailto:[email protected]>  

                Sent: March 23, 2007 7:39 PM

                Subject: RE: [lwip-users] lwip and/or general tcp
problems

                 

                I am working on a data acquisition system using an
Analog Devices' Blackfin BF537, which has a 100Mb/s MAC and utilizes a
port of lwip.  The lwip port appears to be derived from STABLE-0_6_3.
My application requires high throughput on the ethernet interface
(~20Mb/s), so I have been creating very simple applications to run on
the embedded processor with lwip to test the throughput and reliability
of the setup.  The sample application on the BF537 simply creates,
binds, and listens on a socket, and then in an infinite loop accepts a
single connection and then while that connection is open sends large
packets (1460 bytes) on the connection.  I have a simple LabVIEW
application that receives the data, and I have also been using the
Wireshark analyzer to look at the transfers.  In this configuration, I
am experiencing the following that I would really appreciate some
insight on:

                 

                1) When lwip is configured to use DHCP, it is very
difficult to maintain a high throughput.  In fact, the connection very
frequently times out after transferring just a few packets.  I don't see
much other traffic related to having the DHCP server on the LAN, and I
use a switch to isolate the transmitting device and the receiving PC.

                 

                [TT] This could be a function of the configuration of
your DHCP server, and the length of lease that is granted during the
initial dhcp negotiation.

                I will confirm this.  I have attached a log file showing
a case that timed out after a few transfers (070323 DHCP Startup Failed,
some data.pcap), and one that failed with no data transferred (070323
DHCP Startup Failed.pcap).

                 

                2) When not using DHCP, in general the connection is
more reliable.  However, there appears to be a "cold start" issue, where
when the devices on the LAN (transmitter, switch, and receiving PC) are
powered on for the first time the connection has trouble establishing
itself.  A few packets will transfer successfully, followed by a dropped
packet with no successful retransmissions over 30 seconds.

                [TT] This is pretty hard to diagnose.  To my mind, it
sounds like it could be problems with the way in which the application
design at system startup.  To diagnose this more closely, sniffer logs
would be needed.

                I have attached a log file showing this failure (070323
Startup Failure.pcap).  Do you have a recommendation for the way the
system should startup?

                [TT] Not really.  I might try to isolate the problem by
trying various ordered startup procedures, then maybe a fix would
present itself.

                Is a delay between accepting the connection and
transmitting data likely to improve this issue?  There is already a
considerable delay between when I power the switch and when I make the
connection.

                [TT] It could.  If you try to  send before the link has
been established, there could be some problems with dropped packets.
DHCP can work up to some fairly long waits, which would delay
establishment of any connections.  If your port incorrectly (as I just
found mine does) marks the netif as 'up' at interface open time when
dhcp is enabled, then there could be some issues.  The dhcp framework
marks the interface as 'up', via the netif_set_up() once the dhcp bind
occurs.

                3) Again without DHCP, I can observe stalls in the
transmitted data stream.  Normally, packets are transmitted more than
once a millisecond (up to 8 or ten per millisecond), but occasionally
there are periods of ~150ms where no data is transmitted.  The receive
window has not closed, and there is not indication of dropped packets or
retransmission in the log file.

                 

                [TT] It could be that the transmit window (assuming TCP)
is full.  It could also be something to do with the multitude of
#defines that tune the performance/space in opt.h.  Some sniffer logs
may shed some light on the issue.  What window size does the remote end
advertise?

                The remote end advertises a 64k window size.  I wasn't
clear on a lot of the #defines - I've attached my option header file,
could you comment? Is there somewhere I have limited my transmit window
to just a few segments?

                [TT] Yes, in the sniffer log I see your transmit window
is limited to 8192, which is pretty small.  This is governed via the
TCP_WND #define in your lwipopts.h.

                4) Still without DHCP, I observe ~2s stalls.  These
appear to be caused by >1 dropped packet, which results in the first
dropped packet being resent by fast retransmission, and all other
packets being resent by the retransmission timers.

                [TT] This sounds like half-duplex Ethernet operation to
me.  Make sure you don't have any half-duplex hubs floating around on
your network.  These will cause random wait times on the order you
mentioned.

                I confirmed thatthe 3 devices comprising my LAN
(embedded device, hp switch, and ibm laptop) are all at least 10/100
auto negotiate half/full duplex, and the ibm laptop is a 1Gb device.
Other than forcing the devices to 100Mb Full duplex, is there a way to
confirm that nobody is operating at half duplex?  [TT] Not without some
access to the driver statistics, or a LAN analyzer.  If you have access
to some driver statistics, and you see any collisions, then you know
there's a half-duplex device on that segment.

                Can you clarify why a half-duplex hub would cause random
waits?

                [TT] It's due to the collision handling protocols of the
CSMA/CD thing.  I'm having trouble viewing the 802.3 standard at the
moment, but the basic operations is as follows.  If a node starts to
send an Ethernet frame, but detects a collision, it backs off for a
random interval, which, if I recall correctly, can range upwards of a
second, before it attempts a retransmit.

                 

                Can anyone confirm that any or all of these behaviors is
unexpected in a LAN environment (RTT normally <1ms)?  Although I'm new
to this, it seems surprising that my little LAN with <15' CAT5 cable
segments is so likely to have corrupted or lost packets.  

                [TT] An old hub or faulty connector can cause all sorts
of issues.  I'd revert back to as simple a network as possible, and
proceed from there, adding segments until some bad behavior is
exhibited.

                I can try this with just a crossover cable, but there's
not much room to go simpler. For the DHCP problems, can you recommend a
simple way to add a DHCP server without connecting into my full office
network?

                [TT] I loaded up one of my targets with an Ubuntu
install, then installed the dhcpd3 server.  This gives me additional
visibility into what's going on with the DHCP negotiation, and I can try
out various options, etc.

                 

                Can anyone give me some guidance on what to expect
regarding lost packets?  

                [TT] An analysis I did some time back for an avionics
platform concluded that I could expect that the phy, at a minimum, would
cause one lost/corrupt packet per 24 hour period on a 3 in. long peer to
peer link.  It seems to me that a dozen a day on a small network would
not be unusual.

                A dozen a day doesn't sound unreasonable. I'm currently
able to generate what I assume are lost/corrupt packets within a 20 or
30 second log file.

                Are the recovery processes I've observed correct
behavior?  Should only a single packet be resent usign fast
retransmission?  Is there anything inherent in the stack that could
cause brief pauses in the data stream?  Why does using DHCP apparently
make it so difficult to establish and maintain a high-throughput
connection, particularly since there doesn't seem to be any other
traffic on the LAN?

                 

                Apologies for the multiple questions, but I needed to
start somewhere, and I've already reached the limit of what the Analog
Devices' support engineers can help with.  I can provide the log files
from Wireshark if that would be helpful, but some are very large (tens
of megabytes).  I'd also be interested if anyone can suggest other
resources to further my understanding of networking and TCP/IP issues.

                [TT] You'd start by locating the portions of the capture
logs that show aberrant behavior. 

                I'll follow up with those logfiles shortly. Is there an
easier way to cut them down to size than using the editcap command-line
utility? 

                [TT] I sometimes use the GUI, highlight the sections I
want, then save the selection to a file.  I've never tried the editcap,
but it sounds painful.

                Thanks,

                 

                Paul Butler

                 

                
________________________________


                _______________________________________________
                lwip-users mailing list
                [email protected]
                http://lists.nongnu.org/mailman/listinfo/lwip-users

        Development Engineer
        Vtech Engineering Corporation
        978-974-9944

        
________________________________


        _______________________________________________
        lwip-users mailing list
        [email protected]
        http://lists.nongnu.org/mailman/listinfo/lwip-users

_______________________________________________
lwip-users mailing list
[email protected]
http://lists.nongnu.org/mailman/listinfo/lwip-users

Reply via email to