Julian Elischer wrote:
Luigi Rizzo wrote:
On Fri, Mar 30, 2007 at 01:40:46PM -0700, Julian Elischer wrote:
I have been looking at the IPFW code recently, especially with respect to locking. There are some things that could be done to improve IPFW's behaviour when processing packets, but some of these take a
toll (there is always a toll) on the 'updating' side of things.

certainly ipfw was not designed with SMP in mind. If you can tell us what is your plan to make the list lock free
(which one, the static or dynamic ones ?) maybe we can comment more.

E.g. one option could be the usual trick of adding refcounts to
the individual rules, and then using an array of pointers to them.
While processing you grab a refcount to the array, and release it once
done with the packet. If there is an addition or removal, you duplicate
the array (which may be expensive for the large 20k rules mentioned),
manipulate the copy and then atomically swap the pointers to the head.

This is pretty close.. I know I've mentioned this to people several times over the last year or so. the trick is to try do it in a way that the average packet
doesn't need to do any locks to get in and the updater does more work.
if you are willing to acquire a lock on both starting and ending
the run through the firewall it is easy.
(I already have code to do that..)
(see http://www.freebsd.org/~julian/atomic_replace.c (untested but
probably close.)
doing it without requiring that each packet get those locks however is a whole new level of problem.

The locking overhead per packet in ipfw is by no means its limiting
factor.  Actually it's a very small part and pretty much any work on
it is lost love.  It would be much better spent time to optimize the
main rule loop of ipfw to speed things up.  I was profiling ipfw early
last year with an Agilent packet generator and hwpmc.  In the meantime
the packet forwarding path (w/o ipfw) has been improved but relative
to each other the number are still correct.

Numbers pre-taskqueue improvements from early 2006:
 fastfwd                580357 pps
 fastfwd+pfil_pass      565477 pps  (no rules, just pass packet on)
 fastfwd+ipfw_allow     505952 pps  (one rule)
 fastfwd+ipfw_30rules   401768 pps  (30 IP address non-matching rules)
 fastfwd+pf_pass        476190 pps  (one rule)
 fastfwd+pf_30rules     342262 pps  (30 IP address non-matching rules)

The overhead per packet is big.  Enabling of ipfw and the pfil/ipfw
per packet and their indirect function calls cause a loss of only
about 15'000 pps (0.9%).  On the other hand the first rule costs 12.9%
and each additional rule 0.6%.  All this is without any complex rules
like table lookups, state tracking, etc.

                idle            fastfwd fastfwd+ipfw_allow fastfwd+ipfw_30rules
cycles          2596685731      2598214743      2597973265      2596702381
cpu-clk-unhalted 7824023        2582240847      2518187670      2483904362
instructions    2317535         1324655330      1492363346      2026009148
branches        316786          174329367       191263118       294700024
branch-mispredicts 19757        2235749         10003461        8848407
dc-access       1417532         829159482       998427224       1235192770
dc-refill-from-l2 2124          4767395         4346738         4548311
dc-refill-from-system 89        803102          819658          654661
dtlb-l2-hit     626             10435843        9304448         12352018
dtlb-miss       129             255493          130998          112644
ic-fetch        804423          471138619       583149432       870371492
ic-miss         2358            34831           2505198         1947943
itlb-l2-hit     0               74              12              12
itlb-miss       42              92              82              82
lock-cycles     77              803             352             451
locked-instructions 4           19              2               4
lock-dc-access  6               20              6               7
lock-dc-miss    0               0               0               0

Hardware is a dual Opteron 852 at 2.6GHz on a Tyan 2882 mainboard with
a dual Intel em network card plugged into a PCI64-133 slot.  Packets
are flowing from em0 -> em1.

--
Andre

_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to