----- Original Message ----- From: "Patrick Schaaf" <[EMAIL PROTECTED]> To: "Harald Welte" <[EMAIL PROTECTED]>; "Patrick Schaaf" <[EMAIL PROTECTED]>; "Martin Josefsson" <[EMAIL PROTECTED]>; "Aviv Bergman" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Tuesday, 19 March, 2002 12:16 Subject: Re: [Q] connection tracking scaling
> > I'd rather like to have this information to be gathered at runtime within > > the kernel, where one could read out the current hash occupation via /proc > > or some ioctl. > > OK, that's what I wanted to hear :-) > > Actually, the interesting statistics for a hash are not that large, and all > aggregate: > > - bucket occupation: number of used buckets, vs. number of all buckets > - average chain length over all buckets > - average chain length over the used buckets > - counting of a classification of the chain lengths: > - number of 0-entry buckets > - number of 1-entry buckets > - number of 2-entry buckets > - number of 4-entry buckets > - number of 8-entry buckets > - number of 16-entry buckets > - number of more-than-16-entry buckets > > That's 10 values, and will at most double when I think more about it. > I propose to gather these stats on the fly, and simply printk() them > at a chosen interval: I'm not a conntrack specialist, neither a kernel hacker, but I've some experience with ip hash caches in access servers (BRAS) that may be useful(?): some additional stats: - HDA: cache hit depth average: the number of iterations in the bucket's list to get the matching collision entry. - MDA: cache miss depth average: the number of iterations required without matching a cache entry (new connection). HDA is meaningful if you have a bad cache distribution or a small CIS/CTS ratio (Cache Index size=number of hash buckets / Cache total size=total number of conntrack tuples cachable). It also provides good information on traffic type and cache efficiency: In fact, lets assume you have realtime traffic (RTP) and bursty traffic (HTTP/1.1 with keep alive) at the same time, and that the tuples for both type of traffic are under the same hash key. Now if your RT tuple is at the end of the collision list, or after the bursty entries, you will need frequent extra iterations to get your RT tuple... The work around for that is "collision promotion": you keep a hit counter in each tuple and just swap one position ahead the most frequently accessed tuple. some questions: - have you an efficicent 'freelist' implementation? What I've seen about kmem_cache_free and kmem_cache_alloc doesn't look like a simple pointer dereference... Am I wrong? - wouldn't it be worth to have a "cache promotion" mechanism? regarding [hashsize=conntrack_max/2], I vote for! An alternate solution would be to have a dynamic hash resize each time the average number of collisions exceeds a treshold (and no down resize, except maybe asynchroneously). But given my experience I would say that ip hash distribution is not at all predictable (unless you know where in the net path your box will be, and what traffic type (VoIP, HTTP, eDonkey, ...) your box will have to handle, and even then, your predictions will not be valid for more than 6 month!). Therefore, the common way to handle unpredictable distribution is to define: [ max hash index size >= max number of cache tuples] with a dynamic hash index resize.... One last word: the hash function you're using is the best compromise between unpredictable ipv4 traffic, cache symetry, uniformity and computation time. I wouldn't change it too much, but there are two propositions possible: - if you keep the modulo method (%), use a prime number far from a power of 2 for 'ip_conntrack_htable_size'. - if modulo is too slow, use the bitmasking method (&) with hsize being a power of 2, and with 2 bitshifts ((key+key>>20+key>>12) & hsize), but this method is not as efficient as the modulo method, and must be reconsidered for ipv6. hope this may help... > > echo 300 >/proc/net/ip_conntrack_showstat > > would generate one printk() every 300 seconds. Echoing 0 would disable > the statistics gathering altogether. > > I think I can hack this up, today. Having the flu must be good for something... > > later > Patrick > >