[masq] Killing the MTU bug forever...

David A. Ranch Sun, 27 Dec 1998 15:29:10 -0500
        
In the spirit of trying to kill this bug
for once and for all, here are a few KEY emails 
I've saved about this issue.  I hope this helps
some of the coders on the group.


--#1

Resent-Date: Tue, 25 Mar 1997 03:20:52 -0800
From: Keith Owens <[EMAIL PROTECTED]>
To: Paul Bayer <[EMAIL PROTECTED]>
cc: [EMAIL PROTECTED]
Date: Tue, 25 Mar 1997 22:25:02 +1100
Resent-From: [EMAIL PROTECTED]
X-Mailing-List: <[EMAIL PROTECTED]> archive/latest/136
X-Loop: [EMAIL PROTECTED]
Resent-Sender: [EMAIL PROTECTED]
Subject: [masq] Re: Only limited access to WWW using IP-Masq

On Tue, 25 Mar 1997 10:21:10 +0100, 
Paul Bayer <[EMAIL PROTECTED]> wrote:
>The problem is: Netscape (and also other browsers I tried) cannot access
>WWW-sites like <www.yahoo.com>, <www.apple.com> and many others. It
>accesses fine some sites like <www.ibm.net>, Lycos, Sunsite et. al.
> ...
>Guys, I don't have any more ideas, so I need your advice and help. I saw
>in the archive that there were questions of people with similar
>problems, but they didn't get answered. What about answering me?

Actually they have been answered but privately to cut down the
repetitive list traffic.  Obviously it is time to send this message to
the list again.  For those of you who have seen this before - sorry.


Classic symptoms of a path MTU problem.

When you start a TCP connection, the two ends of the link negotiate on
a path Maximum Transmission Unit (MTU), basically the largest packet
size that can be sent across the connection.  Alas, somewhere between
you and the failing site there is a box with a link MTU which is
smaller than the path MTU.  This does not cause a problem until one
site tries to send a large packet to the other and the packet is marked
"do not fragment".

Although the packet is within the agreed path MTU, it is too big for
the intermediate link and it cannot be fragmented (the DF bit is set).
When a large packet hits the link with a smaller MTU, the link is
supposed to send an ICMP response "Unreachable, need to fragment" back
to the sender.  The response includes the original packet header so the
sender can see which packet failed.  The sender is supposed to look at
the ICMP response, get the size of the packet, pick a new path MTU
which is less than the failing packet size and send the data in a new,
smaller packet.  This is called "Path MTU Discovery".

Most of the time path MTU discovery works fine.  The link starts with a
large MTU then settles down to an MTU set by the smallest link in the
route.  When discovery fails you get partial transfers then nothing,
all the big packets are getting lost.  There are several possible
causes of failure.

1) The intermediate link is not sending ICMP messages, it is just
   dropping the packets into the bit bucket.  There are some of these
   old routers out there but fortunately, not too many.
   
   Best solution is 4,000 volts up the router backplane.

2) The sender is not seeing the ICMP response.  Usually caused by some
   firewall being a bit too paranoid.

   The sender has to adjust their routers and/or firewall to let ICMP
   in.

3) The sender is seeing the ICMP response but is ignoring it.  Should
   not really happen.

   Sender has to correct their TCP/IP stack to handle ICMP "need to
   fragment".

4) The link with the low MTU is corrupting the packet header it sends
   back.  This problem is very common in routers based on BSD 4.2 and
   is known to affect Annex 4000's.  Read RFC 1191, section 5 for the
   gory details.  If the sender's TCP/IP stack is not aware of this
   common bug, it does not pick the correct MTU.  Windows NT (at least
   3.1) does not recover when faced with these routers.

   Solutions:

   a) Replace all buggy router code on the Internet (not going to
      happen anytime soon).  If the problem link is at your ISP this is
      the best solution for this bug.

   b) Upgrade the sender's TCP/IP stack to recognise the buggy routers
      and calculate the correct MTU (Microsoft, not going to happen
      anytime soon).

The real problem with a failing path MTU discovery is that the fix has
to come from the sender's end, not from the receiver.  To get a
permanent fix, you have to explain the problem to the sender and get
them to diagnose and correct the problem from their end.

In some cases, the simplest option is for the sender to disable path
MTU discovery completely.  This turns off the "do not fragment" bit in
the packet header.  When a large packet hits the low MTU link, instead
of bouncing back to the sender, the packet is fragmented and forwarded
in pieces.  However fragments can have their own problems, this option
does not always work.

The only thing the receiver can do to bypass the problem is force the
original path MTU to a lower value so the offending link never sees big
packets.  If you have the application source code, setsockopt
TCP_MAXSEG will set the MTU for a single socket, it must be set after
creating the socket and before connecting to the other end.

If changing the source is not an option, you can change the path MTU
by setting the MTU value on your network interface.  For example,

ifconfig <interface> <ip-address> MTU <number>

The interface MTU can be changed at any time, although it can cause
problems if you reduce the interface MTU while connections are routing
across the interface.  Best to set it at start up or when no
connections exist.

WARNING: Changing the interface MTU will affect *ALL* new connections,
         not just the one you are trying to fix.  Low MTU's can slow
         down connections.

Picking the MTU number is a problem, the receiver never sees the ICMP
messages so you cannot tell what the limit is.  One option is just to
go down the list of common MTU values (see rfc 1191) until you hit a
value that works, starting a new connection each time.  Alternatively,
if you know how to read tcpdump output, run tcpdump while you "ping -s
<number> site", reducing <number> until you get a response which is not
fragmented, the path MTU is then <number+28>.
--

--#2
> I'm not convinced that your experiences and the masquerading bug are
> connected. The symptoms are the masquerading bug were:

>  * When ONE linux box was masquerading between two interfaces with
>    different MTUs, the ICMP Must Fragment packets were being incorrectly
>    addressed.

I might qualify. Given 'sl0' and 'eth0' on the SAME machine are to be
considered these two interfaces.

> As far as I can see, your masquerading box has two ethernet with 1500
> MTUs. Have you got ICMP masquerading switched on?

That's not the case. I can see that I wasn't pretty clear. Below I've
explained the situation again, hopefully better.

My system setup is as follows:

    sl0    /---------------\        eth0         /--------------\
 /---------| 192.168.1.1   |---------------------| 192.168.1.2  |
 |    MTU: | Linux gateway | MTU:           MTU: | Linux client |
 |    600  \---------------/ 1500           1500 \--------------/
 |
Connected
to a dial-up.
(via a default gateway
routing table entry).

This doesn't work. Packets seem to be lost (details later).

When I tried figuring out what's going on (using tcpdump), I've discovered
that certain packets don't show up at all.

A typical situation was:

A connection is opened between the client and www5.yahoo.com (same applies
to www.linuxos.org, www.linux.org, www.lycos.com and quite a few more. Not
to mention various ftps. In short, practically everything). A packet is
sent from the client to yahoo ("GET / HTTP/1.0\r\n\r\n"). yahoo acks the
packets (or, at times, two packets, when there is a certain delay between
one CRLF pair to the other), and supposedly sends a reply. The reply is
never shown by tcpdump on either the server or the client (both on sl0 or on
eth0, on the server), but I certainly believe it reaches the server (unless
there is some really nasty bug somewhere along the way, which I really can't
imagine).
I can only assume that whatever the bug is, it prevents tcpdump from
capturing the packet (but I have no clue on how tcpdump works). It is quite
possible (but is only an assumption) that the packet missing was fragmented
along the way (possibly due to the 600 MTU on sl0), and never got
reconstructed properly, thus never reaching the stage in which tcpdump can
capture it.

Suddenly (oh, the suspense) a certain packet from yahoo is received. It
carries "future" data, jumping over what seems to be two 600-octect packets.
Of course, the client can't ack it (because it is missing quite a few octects
in between) so it reacks octect 1. That's where the connection becomes
inactive and nothing further is sent. This packet is smaller than the MTU, by
the way (in the specific case, it was 588 octects long, including IP and TCP
headers).


When changing the MTU on the ethernet device in the client to 600, things
seem to work pretty fine. The client sends an mss request of 560, which
pretty much prevents fragmentation for anything reaching as responses to
192.168.1.2 through sl0.


That pretty much wraps it up. It is important to mention that the server
copes pretty well with several Windows 3.11 machines (using Trumpet). For
the sake of these testings, I've eliminated their existence (by turning the
machines off...).
Another possibly important fact is that the client doesn't have a physical
Linux installation. I boot it from a floppy, and use nfs to the server in
order to have the required files. I doubt this is the problem (and it
probably shouldn't be, anyway), but sadly enough I cannot have a stand-alone
Linux installation on that machine due to space constraints.


I do *NOT* have ICMP masquerading enabled in the kernel. Should I try that?

Could you suggest anything at all? I'd do almost anything (but will not
install a development kernel. I've heard too many horror stories. ;-) ).

                                                   Nimrod


--

--#3
It might be worth observing that over in comp.dcom.sys.cisco (and other
places as well) small battles occasionally rage about the effects of
filtering ICMP.  Paranoid router admins have started to experiment with
dropping all ICMP, thus breaking path MTU discovery (and other atomic
features of TCP/IP).  I have no idea if compiling MTU discover OUT of
the linux kernel would fix the situation.  It is pretty clear, though,
that some TCP/IP features need to be redesigned/reengineered to increase
security and that IETF policy of liberal acceptance- strict emmitance is
reaching even further down the stack than, perhaps, initially intended. 
Boy, IPv6 should prove to be even more interesting (in the true vain of
the Chinese curse) once available for general consumption.
--
.----------------------------------------------------------------------------.
|  David A. Ranch - Linux/Networking/PC hardware         [EMAIL PROTECTED]  |
!----                                                                    ----!
`----- For more detailed info, see http://www.ecst.csuchico.edu/~dranch -----'
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
For daily digest info, email [EMAIL PROTECTED]
[masq] Killing the MTU bug forever...

Reply via email to