On Wed, Dec 06, 2006 at 04:51:51AM +0100, Max Laier wrote:
> On Wednesday 06 December 2006 01:17, Luigi Rizzo wrote:
...
> > First, this proposal, with 36 multiplies and one division, the
> > function seems rather expensive for e.g. a low end cpu (arm or
> > soekris) as you might find on network-appliance boxes.
> > Any chance to get performance numbers on that hardware ?
> 
> I tried the reference machines (see hacked up attachment):
> 78x ia64
> 40x amd64
> 60x p3
> 16x p4

i assume the first number is the slowdown between the current and
the proposed method ?
I have slightly modified/extended the program adding the hsieh hash
that i mentioned below, and made it easy to add more methods. the
code is at

        http://info.iet.unipi.it/~luigi/hc.c

as expected, especially the simplest algorithms depend a lot
on cache effects so if you change the number of packets or the number
of loops things vary a lot. In any case, on a soekris (the default
parameters are too high):

        # ./hc 1000 100
        starting algorithm hash_pkt5 1000 loops 100 packets
        took 87862 usec, 0.878620 per cycle
        starting algorithm hash_hsieh 1000 loops 100 packets
        took 1082883 usec, 10.828830 per cycle
        starting algorithm hash_pkt6 1000 loops 100 packets
        took 2697178 usec, 26.971780 per cycle
        # ./hc 100 10000
        starting algorithm hash_pkt5 100 loops 10000 packets
        took 2023610 usec, 2.023610 per cycle
        starting algorithm hash_hsieh 100 loops 10000 packets
        took 11619238 usec, 11.619238 per cycle
        starting algorithm hash_pkt6 100 loops 10000 packets
        took 27739595 usec, 27.739595 per cycle
(here we probably overflow some cache so the simple algorithm
suffers a lot by increasing the number of different packets)

on my new 3 GHz pentium D

        > ./hc 1000 100
        starting algorithm hash_pkt5 1000 loops 100 packets
        took 1258 usec, 0.012580 per cycle
        starting algorithm hash_hsieh 1000 loops 100 packets
        took 16152 usec, 0.161520 per cycle
        starting algorithm hash_pkt6 1000 loops 100 packets
        took 25485 usec, 0.254850 per cycle
        > ./hc 100 10000
        starting algorithm hash_pkt5 100 loops 10000 packets
        took 12870 usec, 0.012870 per cycle
        starting algorithm hash_hsieh 100 loops 10000 packets
        took 162510 usec, 0.162510 per cycle
        starting algorithm hash_pkt6 100 loops 10000 packets
        took 248003 usec, 0.248003 per cycle

Surely we need to experiment a bit more, but the cost
is significant especially on low end boxes.
Maybe we could restrict the hash to just a part of the address ?

        cheers
        luigi
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ipfw
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to