In article <[EMAIL PROTECTED]>,
Ramin Alidousti  <[EMAIL PROTECTED]> wrote:
>On Sat, Apr 13, 2002 at 02:05:56AM -0400, Zygo Blaxell wrote:
>> Well, yes, that is what I mean.  Routers tell TCP that bandwidth is scarce
>> by dropping packets (or using the ECN bit, but PPP-over-SSH doesn't set
>> the ECN bit either).
>
>Or with higher latency.

Higher latency does not imply that bandwidth is scarce.  TCP does not
consider latency when estimating available bandwidth, except if latency
increases suddenly (in which case TCP (mis)interprets the latency as
packet loss).

>The loss of a packet is not reported by the receiver but detected by the
>sender itself (timer).

Or by SACK or duplicate ACKs, which come from the receiver.  To be
strictly correct, losses are inferred by the sender from data supplied
or _not_ supplied by the receiver.

>Each TCP connection has its own set of variables which is not shared with
>the other instances. So, the underlaying TCP (SSH) might have different 
>window-size, timer-value and for that matter even different heuristics than
>the encapsulated TCP riding on top of it, completely independent.

Events on the underlying IP (packet loss, latency, etc) which affect the
underlying TCP, will cause variables that TCP assumes are independent to
become dependent in the encapsulated TCP.  Packet loss without latency
becomes latency without packet loss.  TCP heuristics optimize in one
direction to deal with high latency (send more packets to increase
throughput) and the *opposite* direction to deal with packet loss (send
fewer packets to reduce congestion).  The Nagle algorithm does all kinds
of damage when applied to TCP itself, especially to the TCP slow start
algorithm.  Encapsulated TCP retransmitted packets become duplicate ACKs
at the receiver, which abuse the congestion window in the other direction.

If it is possible to set the parameters of the underlying TCP, then it
can be configured to work a little better, but in most cases the reason
why people attempt to do TCP over TCP is also the reason why they can't
change these parameters--the underlying TCP connection is some kind of
corporate HTTP proxy.

>This phenomenon can go on until the RTT of the encapsulated TCP is slightly
>larger than the RTT of the underlaying TCP. At this time they should stabilize.
>This delta is proportional to the SSH and PPP overhead.

This is not true at all.

The minimum RTT in the encapsulated TCP is slightly larger than the RTT
of the underlying TCP.  The maximum RTT of the encapsulated TCP is for
all practical purposes unlimited--it's the size of the TCP window and
buffers utilized divided by the minimum available bandwidth.

If retransmission is required on the underlying TCP, it will add delay
to the encapsulated TCP.  This delay will accumulate and persist until
the next time the underlying TCP connection becomes idle, or until TCP
connections (at any level) fail due to timeout.  

Further, if the TCP implementations are similar, the delay added in the
underlying TCP by the retransmit will probably be slightly longer than
the time that the encapsulated TCP assumes means a packet has been lost.
This will add retransmissions to the encapsulated TCP stream at exactly
the time when the underlying TCP is retransmitting segments itself,
which increases latency, decreases throughput, and wastes bandwidth
all at once, and increases the amount of traffic that has to be cleared
before the underlying TCP can reach idle state.  Typically encapsulated
TCP throughput drops to zero before this happens.

>> The failure modes are:

>>      - steadily increasing latency when bandwidth is in use, up to
>>      about 120 seconds
>> 
>>      - SSH blocks on input or output and it hangs.  Some SSH versions
>>      have fixed this problem.
>> 
>>      - SSH or PPP protocol timeout (trivially easy to avoid)
>> 
>>      - TCP timeout (75 second delay * 9 retransmits = TCP connection
>>      fails, IIRC)

>Case (2) and (3) are being considered as bugs or, for that matter, as
>deficiency/shortcomings and case (1) and (4) would affect both TCP sessions.

In a sane network configuration between two TCP peers case 1 failure does
not occur.  Latency increases to somewhere near the (packet queue size)
/ (bandwidth) of the devices in the network path, then stays mostly
constant thereafter--once all the buffers are full, all further packets
will be lost, so they won't add to latency.  Real network devices don't
delay packets arbitrarily; what they can't send right away gets dropped.

Dialup and DSL modems have several seconds' worth of buffering inside
them, but they are typically used only near the extreme ends of a real
Internet network route.  Most TCP's won't actually utilize all of the
available queue space because the probability of packet loss increases
as available queue space decreases, and TCP's estimation of available
bandwidth decreases geometrically when packet loss is discovered.

Case 4 is much more common in PPP-over-SSH and similar configurations
than it is in IP-over-packet-carrier configurations due to the RTT
spiralling out of control.

Case 2 and 3 are really symptoms of the other problems.

-- 
Zygo Blaxell (Laptop) <[EMAIL PROTECTED]>
GPG = D13D 6651 F446 9787 600B AD1E CCF3 6F93 2823 44AD

Reply via email to