-------- Message original --------
Sujet: [RFC] Idea to speedup tcp lookups
Date: Tue, 02 Aug 2005 11:53:12 +0200
De: Eric Dumazet <[EMAIL PROTECTED]>
Pour: David S. Miller <[email protected]>
Copie: [EMAIL PROTECTED], [EMAIL PROTECTED]
Références: <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> <[EMAIL PROTECTED]>
<[EMAIL PROTECTED]>
Hi David, Hi all
I would like to provide a patch to speedup tcp lookups, but I need your
comments first.
1) First some peformance data :
--------------------------------
tcp_v4_rcv() waste a lot of time in __tcp_v4_lookup_established()
The most critical code is :
sk_for_each(sk, node, &head->chain) {
if (TCP_IPV4_MATCH(sk, acookie, saddr, daddr, ports, dif))
goto hit; /* You sunk my battleship! */
}
The sk_for_each() does use prefetch() hints but only the begining of "struct
sock" is prefetched.
So TCP_IPV4_MATCH() has to bring into CPU cache cold cache lines.
Each iteration has to use at least 2 cache lines.
2) The goal
-----------
The idea I have is to change things so that TCP_IPV4_MATCH() may return FALSE
in 95% of cases only using the data already in the CPU cache,
using one cache line per iteration.
3) Description of what is planned
----------------------------------
Changes in layout are to move the "__u16 dport ; __u16 num" from "struct
inet_sock" to the end of "struct sock_common",
where there is some padding (at least on 64 bits platforms)
File include/net/sock.h
struct sock_common {
unsigned short skc_family;
volatile unsigned char skc_state;
unsigned char skc_reuse;
int skc_bound_dev_if;
struct hlist_node skc_node;
struct hlist_node skc_bind_node;
atomic_t skc_refcnt;
+ union {
+ unsigned int key; /* hash key for fast lookups, or protocol
private data */
+ unsigned short us[2];
+ } skc_u;
};
File include/linux/ip.h
struct inet_sock {
...
__u32 rcv_saddr; /* Bound local IPv4 addr */
- __u16 dport; /* Destination port */
- __u16 num; /* Local port */
...
+#define inetsk_dport sk.skc_u.us[0]
+#define inetsk_num sk.skc_u.us[1]
Then change every sk->dport to sk->inetsk_dport, and every sk->num to
sk->inetsk_num
Doing so even save 8 bytes for sizeof(inet_sock) on 64 bits platforms :)
Then change the the TCP_IPV4_MATCH macro to
File include/net/tcp.h
64 bits platforms :
#define TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif)\
(((__sk)->skc_u.key == (__ports)) && \
((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \
(!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
32bits platforms:
#define TCP_IPV4_MATCH(__sk, __cookie, __saddr, __daddr, __ports, __dif)\
(((__sk)->skc_u.key == (__ports)) && \
(inet_sk(__sk)->daddr == (__saddr)) && \
(inet_sk(__sk)->rcv_saddr == (__daddr)) && \
(!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))
This way, the comparison with (__sk->skc_u.key) should reference data already
fetched is CPU caches, or in the same cache line than
__sk->skc_node (the next element in hash chain)
Discussion :
Instead of using (dport,num) as a key, we could use the tcp_hashfn() value to
have better fast path, but we would use more memory.
The patch would be nicer, not changing "struct inet_sock".
Thank you for your comments and ideas.
Eric Dumazet
-
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html