On Mon, 18 Mar 2002, Patrick Schaaf wrote: > Hello Martin, > > thanks for your numbers. > > > with 16384 hashbuckets and a maximum of 131072 tracked connections it took > > 7.5 seconds to perform 1 million lookups in the hashtable (using > > __ip_conntrack_find from userspace). > > Was this 1 million lookups to a random hash bucket each, with guaranteed > "no match"? Assuming this in the following...
yes that was a lookup of a connection not present in the conntrack. > That's 8 connections per bucket, resulting in 8 million "pointer chasings" > during table lookup, i.e. about 937ns per chasing. and compare of tuple... > > with the same number of hashbuckets (16384) but with a maximum of 262144 > > tracked connections (131072 * 2) it took just over 12 seconds to perform > > the same test (1 million lookups). > > Now you have an average chain length of 16, for 16 million chasings total, > and 750ns per chasing - probably a bit better (per chasing, not in reality) > because the hash array cache line load amortizes over the list length. > > > and then I doubled the hashsize to 32768 and left the maximum number of > > tracked connections at 262144 and performed the test again. > > 1 million lookups took 7.5 seconds again despite that I was tracking twice > > as many connections as before. > > You are back again at an average chain length of 8, so this is not surprising. I know, it was just to demonstrate. > For nothing but pointer chasing, I'd expect about 250ns (main memory RTT) > for a single chasing, so your numbers indicate a pretty large "testing > overhead". Maybe a different, non-userlevel testing method would be better. This is from userspace and via netlink so I'm sure the overhead in these tests are a lot higher than the lookup for actual packets in the kernel. the codepath is something like this: userspace: program -> libctnetlink -> kernel: netlink -> ctnetlink -> __ip_conntrack_find and then the way back with a response so I assume the overhead is higher then the actual lookups. In my program I have a memset() in the loop and ctnetlink allocated a new skb to send back as a result to userspace. I could probably write a small kernelmodule that could perform the lookups and measure the time for a large number of lookups. I expect the lookups to be quite fast if the chain length isn't too long as all we do is a hash of the tuple and then go do a linear search in the hashbucket. Conntrack uses the standard LIST_FIND macro that's used all over the kernel and that should be quite fast. /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience.