How NAT works and why translation timeouts occur (was: Re: connection reset by host)

kelsey hudson Sat, 20 May 2006 10:36:09 -0700

Carl Lowenstein wrote:

This seems to have cured the problem.  The next question is:  why are
these ssh configuration parameters not mentioned in ssh_coinfig(5) or
in /etc/ssh/ssh_config?  Maybe there is other documentation for ssh.
(OpenSSH)?

I don't remember where I saw this. All I know is that I investigatedsome way to send ssh keepalives. It was biting me with a translationtimeout when doing rsync of large (>50 GByte) files -- ssh was keepingthe socket open but during large block checksum no data was going overthe pipe -- pix firewall decided the connection was stale and removed it.

Why does my Netgear WGR614 do this to me?  Looking at the Netgear user
forum, it appears that others have similar problems with a 10-minute
inactivity timeout.  It is reported that changing to a different
revision of firmware may or may not solve the problem.  That is such a
confidence-building statement.

Changing the firmware probably won't fix the problem. In order toexplain why your nat appliance is doing this, I have to explain howsource nat does its thing. SNAT is a hybrid layer 2/3 translationprotocol. At layer 2 (data link) we have the IP protocol, which controlsaddressing and packet destination.


IP packets contain (typically) a 20-byte header. The fields, in order, are:

Version (4 bits -- always 0x4)
Header Length (4 bits -- number of 32 bit words, which is almost always 0x5)
ToS/DSCP (8 bits -- used in QoS et al)
Total Datagram Length (16 bits)
Identification (16 bits -- used for reassembly of fragmented packets)
Flags (3 bits -- controls fragmentation and a couple other settings)
Fragmentation Offset (13 bits)

Time to Live (8 bits -- number of hops a packet my traverse before beingdropped)Protocol (8 bits -- determines whether the packet contains ICMP, TCP,UDP, et al)Header Checksum (16 bits -- ensures the packet header hasn't beentampered with/mangled during delivery. Invalid packets are immediatelydropped)

Source Address (32 bits)
Destination Address (32 bits)

Optional Other Information (arbitrary length up to eleven 32 bit words-- this is if the header length is greater than 5 words)Data (arbitrary length up to 65,375 bytes -- this contains the datapayload of the packet)

Furthermore, protocols like TCP and UDP sit at layer 3 (transport) andprovide more information about where a packet is going. They also have aheader within the data portion of the IP packet:


We'll use TCP as an example:

source port (16 bit)
destination port (16 bit)
sequence number (32 bits -- rolling packet counter)

ACK number (32 bits -- contains the sequence number of the next packetthe sender expects to receive)Data offset (4 bits -- number of 32 bit words in TCP header -- always atleast 5)Reserved/ECN (6 bits -- three Reserved bits must be zeroed, remainingthree bits control Explicit Congestion Notification)Flags (6 bits -- contains the fields URG, ACK, PSH, RST, SYN, FIN --these control TCP stateful operation)Window Size (16 bit -- the number of bytes the sender is willing toaccept back in a message)Checksum (16 bits -- this is a combination of IP header, TCP header, anddata checksum all in one. If this checksum is invalid, the destinationwill request a retry from the sending station. If this were UDP, thepacket would be dropped).Urgent Pointer (16 bits -- if URG is set in flags, this points to lastbyte in sequence of urgent data. Normally zeroed)

Optional Other Information (if the header size is greater than 5 words)
Data (Up to 65215 bytes)

OK, now that we know what a typical packet header contains, here's whatNAT does on a typical outbound packet.

First, NAT looks at the packet's source address. If the packet is'interesting' (meaning, there is an entry in the appliance's translationsource table for that source address or range) it performs furtherprocessing. It takes that source address and changes it to one of itsdesignated source addresses that it found in its translation sourcetable. Next, the appliance looks at the protocol field and decideswhether it can perform further translation. If it can, it recomputes theIP header checksum and mangles the packet with the new sourceinformation also. Let's say our arbitrary packet contains TCP data, andcan be further translated. In the case of TCP, really only two fieldsmay be touched. The TCP source port is examined. The appliance looks inits TCP state table to see if that source port is already in use on thenew source address. If it is, it picks the next available port. If itisn't, then it leaves the port field intact. The packet is then mangledwith a recomputed checksum and (if necessary) the new source port. Ifthis is a connection establishment request (SYN set) an entry in theappliance's translation state table is created. If not, the TST isqueried for an applicable translation. If one exists, it is used. If onedoesn't exist, a packet with RST set is sent to the originator. Thetranslation state table typically contains (at least) the old sourceaddress, the new source address, the translation timer, and any L3protocol information like old and new source port in the case of TCP.

With Inbound packets the process is a little bit different -- theTranslation State Table is queried first for a translation entry. If oneexists for the packet's source/destination, the reverse of the processin outbound occurs.

The problem exists in appliances with small amounts of memory. So, theirtranslation state tables aren't very big. Which means, they need to timeout their translations when they haven't been used for a set amount oftime to avoid filling the small amount of memory allocated totranslation tasks. In your case, this is 10 minutes. Upon initial entryinto the translation state table, a timer was set on the packettranslation entry. Subsequent packets reset this timer. If the timer isallowed to expire, the translation entry is removed from the table.

This brings up a question: Why wouldn't the appliance just recreate thetranslation entry when subsequent data is transmitted? The answer istwofold: a) there's no way of knowing whether the L3 information willhave changed with the new translation entry (the source port on thetranslated address may be different, or the translated address itselfmay be different) -- since TCP is a stateful protocol, this isunacceptable. b) there's no way of knowing whether the other end hasn'ttried to send data back to a bad translation and received a RST. Ineither case, a RST is sent to the originator saying, basically,'connection closed by remote host.'

NAT breaks the transparent end-to-end connectivity that most IPprotocols were designed with having in mind. And it's true -- in somecases you can get by without transparent E2E. But, there are some timeswhere it's necessary -- connections which sit idle is but one example.

Hope this helps to explain how and why NAT does its thing. Let me knowif you need clarifiaction on anything. :)


Talk to you later,
-Kelsey


--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-list

How NAT works and why translation timeouts occur (was: Re: connection reset by host)

Reply via email to