dear netdevels,

I'm doing some tcp benches on a netfilter enabled box and noticed
huge and surprising perf decrease when loading iptable_nat module. 

- ip_conntrack is of course also loading the system, but with huge memory
and a large bucket size, the problem can be solved. The big issue with
ip_conntrack are the state timeouts: it simply kill the system and drops
all the traffic with the default ones, because the ip_conntrack table
becomes quickly full, and it seems that there is no way to recover from
that  situation... Keeping unused entries (time_close) even 1 minute in
the cache is really not suitable for configurations handling (relatively)
large number of connections/s. 
o The cumulative effect should be reconsidered.
o Are there ways/plans to tune the timeouts dynamically? and what are
  the valid/invalid ranges of timeouts?
o looking at the code, it seems that one timer is started by tuple...
  wouldn't it be more efficient to have a unique periodic callback
  scanning the whole or part of the table for aged entries?

- The annoying point is iptable_nat: normally the number of entries in
the nat table is much lower than the number of entries in the conntrack
table. So even if the hash function itself could be less efficient than
the ip_conntrack one (because it takes less arguments: src+dst+proto),
the load of nat, should be much lower than the load of conntrack.
o So... why is it the opposite??
o Are there ways to tune the nat performances?

- Another (old) question: why are conntrack or nat active when there are
no rules configured (using them or not)? If not fixed it should be at
least documented... Somebody doing "iptables -t nat -L" takes the risk
of killing its system if it's already under load... In the same spirit,
iptables -F should unload all unused modules (the ip_tables modules 
doesn't hurt). Just one quick fix: replace the 'iptables' executable by
one 'iptables' script calling the exe (located somewhere else) and 
doing an rmmod at the end...

comments are welcome;


here is my test bed:

tested target:
 -kernel 2.4.18 + non_local_bind + small conntrack timeouts...
 -PIII~500MHz, RAM=256MB
 -2*100Mb/s NIC

The target acts as a forwarding gateway between a load generator client
running httperf, and an apache proxy serving cached pages. 100Mb/s NICs
and requests/response sizes insure that BW and packet collisions is not
an issue.

Since in my test, each connection is ephemeral (<10ms), i recompiled the 
kernel with very short conntrack timeouts (i.e: 1 sec for close_wait, 
and about 60 sec for established!) This was also the only way to restrict
the conntrack hash table size (given my RAM) and avoid exagerated hash
collisions. Another limitation comes from my load generator creating traffic
from one source to one destination ipa, with only source port variation 
(but given my configured hash table size and the hash function itself
it shouldn't have been an issue).

results are averages from procinfo -n10 [d]

test results:

1) target = forwarding only (no iptables module or rule)
 -  rate          : 100        conn/s (=request-response/s)
 -> CPU load      : 0%         system
 -> context       : 7          context/s
 -> irq(eth0/eth1): 0.9 / 0.9  kpps   (# of packet/sec = #irq/s)

 -  rate          : 500        conn/s
 -> CPU load      : 10%        system
 -> context       : 18->100    context/s (varying!)
 -> irq(eth0/eth1): 4.4 / 4.4  kpps

 -  rate (max)    : 1050       conn/s (max from my load generator)
 -> CPU load      : 25%        system
 -> context       : 1000       context/s
 -> irq(eth0/eth1): 10 / 10    kpps

2) (1) + insmod ip_conntrack 16384 (no rules)

 -  rate          : 100        conn/s
 -> CPU load      : 0.8%       system
 -> context       : 7          context/s
 -> irq(eth0/eth1): 0.9 / 0.9  kpps
 -> conntrack size: 970        concurrent entries

 -  rate          : 250        conn/s
 -> CPU load      : 10%        system
 -> context       : 12         context/s
 -> irq(eth0/eth1): 2.2 / 2.2  kpps
 -> conntrack size: 2390       concurrent entries

 -  rate          : 500        conn/s
 -> CPU load      : 30-70%     system  (varying)
 -> context       : 45-90      context/s
 -> irq(eth0/eth1): 4 / 4      kpps
 -> conntrack size: 4770       concurrent entries

3) (2) + iptables -t nat -L  [=iptable_nat] (no rules)
 -  rate          : 100        conn/s
 -> CPU load      : 1%         system
 -> context       : 8          context/s
 -> irq(eth0/eth1): 0.9 / 0.9  kpps
 -> conntrack size: 970        concurrent entries

 -  rate          : 250        conn/s
 -> CPU load      : 40%        system
 -> context       : 20         context/s
 -> irq(eth0/eth1): 2.2 / 2.2  kpps
 -> conntrack size: 2390       concurrent entries

 -  rate  (max)   : 420        conn/s (all failed)
 -> CPU load      : 97%        system
 -> context       : 28         context/s
 -> irq(eth0/eth1): 3.1 / 4.1  kpps
 -> conntrack size: 4050       concurrent entries

 -  rate (killing): [500]->0   conn/s (all failed)
 -> CPU load      : 100%       system (no response)
 -> context       : ?          context/s
 -> irq(eth0/eth1): ?          kpps
 -> conntrack size: 10500???   concurrent entries

other results with active rules (i.e. REDIRECT) are dependent
of the load generated by the local process handling the traffic,
and are thus not relevant (FYI: max conn/s < 200 with one process
handling the REDIRECTed traffic)

kr,
_______________________________________________________________________

-jmhe-


Reply via email to