I noticed in
http://www.linuxvirtualserver.org/Joseph.Mack/HOWTO/LVS-HOWTO-3.html#conntrack
reports that conntrack is a bottleneck.

==== section 1 ====
Here's a summary of some experiments that show this is true and
further suggest that the real expense is in creating new conntrack
records.  If this is a well known phenomenon, then I apologize for the
waste of bandwidth and invite you to skip to section 2.

These were done with 200MHz machines, one generating packets and
the other forwarding.  They're connected by a 100Mbit link.

First test:
 on router
 iptables -A PREROUTING -t mangle -j MARK --set-mark 2
 traffic generator sends 7000 syn's in 2.4 sec (to an address that
  won't reply) all packets the same except randomized destination ports
 result: all marked and forwarded

Second test:
 on router
 iptables -A PREROUTING -t mangle -m state --state new -j MARK --set-mark 1
 traffic generator sends 6131 syn's in 2.06 sec, just like before
 result: 1766 marked new (and I assume forwarded, the total number sent
  out that interface was a little more, I suppose arp, etc.)

This shows that conntrack is a bottleneck, at least dealing with new
packets.

Third test:
 router as in second test
 traffic generator now sends identical syn packets,
  8382 in 2.08 sec
 result: all marked new (and I presume forwarded, again the total
  number sent out that interface is a bit higher)

This shows, I think (please correct me if I'm wrong) that it's
creating new conntrack records (and perhaps garbage collecting old
ones) that is the real bottleneck.

Fourth and fifth tests:
 router as in first and second test
 traffic generator now creates a real connection, just does rcp of
  a large file to a third machine on the other side of the router
 result: no difference, in each case 25MBytes copied in ~4 sec.

Which shows that conntrack is not a bottleneck for packets in
established connections.

==== section 2 ====

So what?  You might argue that 
- in normal usage new connections are not generated very fast 
- if you're syn flooded then you're dead in any case

I argue that you're not necessarily dead in any case.
If the flood really fills up your bandwidth then you are.
But we see above that the amount of syn bandwidth needed to attack a
conntracking machine is far less than it can otherwise handle.
If, in the case above, the attacker could send "only" 1000 syn's/sec
then the attack would work.  But if conntrack could be improved, e.g.,
to make connection creation faster, the same attack would not work.
Also, even if the attacker really can fill the bandwidth between
himself and the firewall, an improved conntrack could maintain
communication between other interfaces. 

I conclude that if the code for creating new connections could be made
faster (it would have to be MUCH faster) there would be some benefit
in doing so.  I've not looked at that.  I presume whoever wrote it was
at least trying to make it efficient.  If anyone thinks this is a
promising approach, please feel free to say so.

What I'm really interested in is a defense that, e.g., rate limits
syns.  The problem I see in the current implementation (again, correct
me if I'm wrong) is that the very first thing that happens when a
packet arrives is that conntrack processes it, and this includes the
code for creating a conntrack record in the case where there is none.

Below is my proposal to fix this.  I hope for replies that address
the following:
- Does this seem like a good approach?
- What would go wrong?
- Any better ideas?

I propose that the code for creating the conntrack record be postponed
until after the point where a defense would have a chance to drop the
packet.

If you recall my posting in April, you won't be surprised that I want
to put the connection creation code well after the current postrouting
hook.  After that hook the packet is enqueued for traffic control, and
it can be dropped between then and when it is dequeued to be sent.  So
I'd like to create a new hook after that dequeue, and this is where I
propose to create the connection record.

It occurs to me that there may be code now that assumes that the
record will exist, e.g., when you try to do something like the
 -m state --state new
above.  If so, that code would have to be fixed, e.g., to use a
default conntrack record if none exists.  But I'd expect that
conntrack records might be garbage collected at arbitrary times,
including between the time of arrival of the packet that created them
and the time that packet goes through some later hook.  So perhaps
this is not a problem.

Another possible problem I see is that creating the connection record
might require data that is no longer available at the later point,
such as the IP addresses in the packet when it arrived (which might by
now be altered by NAT).  I recall this also arose in the April
exchange.  I proposed then that it would be useful to save a little
"historical" data per skbuff for use by later netfilter hooks.  I
propose now that this should specifically include source/dest IP
addresses and ports.  I think I mentioned before that it should
include the device where the packet arrived.

I look forward to your replies.

Reply via email to