I found an issue regarding networking in the real-time patch
(patch-2.6.21.5-rt17).
The issue happens only with the kernel, which the real-time patch was applied.
However, the latest stable main kernel (2.6.21.5) doesn't have the same issue.
Therefore, please don't transfer this report to netdev.
The detail of issue is below.
I ran my test program, which is a very simple IPv6 client-server program.
These programs establish a TCP/IPv6 connection between two hosts, and then
sleep,
like following diagram.
And then, the problem appears with "cat /proc/net/tcp6".
serverA serverB
| SYN |
+--------------->+
| SYN/ACK |
+<---------------+
| ACK |
+--------------->+
| |
sleep... sleep...
| |
When I "cat /proc/net/tcp6" on serverA while establishing connection between
serverA and B,
the established connections are not shown.
If you need my test program, please let me know. I can provide it to you.
However, in case of the main-line kernel, the established connections will be
shown appropriately with "cat /proc/net/tcp6". It's different because the
real-time patch has implemented a new socket lookup mechanism for a
high-latency.
So, real-time patch has a different mechanism from main-line kernel.
The real-time patch, which implemented a new socket lookup mechanism is using
bitmap(ebitmask). When establishing TCP connection, it sets a flag bit into
the bitmap like followings.
[ebitmask in struct inet_hashinfo]
Before connecting
0000000000000000000000000000000000000000000000000000000000000000
After connecting
0000001000000000000000000000000000000000000000000000000000000000
^
And when reading "/proc/net/tcp and tcp6", the kernel searches the currently
active
TCP connections with reference to the bitmap.
However, the kernel can't search the active TCP/IPv6 connection in established
state.
It is because the kernel doesn't set a flag bit when establishing TCP/IPv6
connection.
In case of TCP/IPv4, __inet_hash() sets the flag bit properly with
__inet_hash_setbit().
But, in case of TCP/IPv6, the setting the flag bit is missing in __inet6_hash().
[include/net/inet_hashtables.h]
static inline void __inet_hash(struct inet_hashinfo *hashinfo,
struct sock *sk, const int listen_possible)
{
struct hlist_head *list;
rwlock_t *lock;
unsigned long *bitmask = NULL;
unsigned int index = 0;
BUG_TRAP(sk_unhashed(sk));
if (listen_possible && sk->sk_state == TCP_LISTEN) {
list = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)];
lock = &hashinfo->lhash_lock;
inet_listen_wlock(hashinfo);
} else {
struct inet_ehash_bucket *head;
sk->sk_hash = inet_sk_ehashfn(sk);
index = inet_ehash_index(hashinfo, sk->sk_hash);
head = inet_ehash_bucket(hashinfo, sk->sk_hash);
list = &head->chain;
lock = &head->lock;
bitmask = hashinfo->ebitmask;
write_lock(lock);
}
__sk_add_node(sk, list);
__inet_hash_setbit(bitmask, index);
sock_prot_inc_use(sk->sk_prot);
write_unlock(lock);
if (listen_possible && sk->sk_state == TCP_LISTEN)
wake_up(&hashinfo->lhash_wait);
}
[net/ipv6/inet6_hashtables.c]
void __inet6_hash(struct inet_hashinfo *hashinfo,
struct sock *sk)
{
struct hlist_head *list;
rwlock_t *lock;
printk("__inet6_hash hit\n");
BUG_TRAP(sk_unhashed(sk));
if (sk->sk_state == TCP_LISTEN) {
list = &hashinfo->listening_hash[inet_sk_listen_hashfn(sk)];
lock = &hashinfo->lhash_lock;
inet_listen_wlock(hashinfo);
} else {
unsigned int hash;
sk->sk_hash = hash = inet6_sk_ehashfn(sk);
hash &= (hashinfo->ehash_size - 1);
list = &hashinfo->ehash[hash].chain;
lock = &hashinfo->ehash[hash].lock;
write_lock(lock);
}
__sk_add_node(sk, list);
sock_prot_inc_use(sk->sk_prot);
write_unlock(lock);
}
So, I suggest a following change.
The change is to set the flag bit appropriately in __inet6_hash().
Signed-off-by: Masayuki Nakagawa <[EMAIL PROTECTED]>
Index: linus-kernel.git/net/ipv6/inet6_hashtables.c
===================================================================
--- linus-kernel.git.orig/net/ipv6/inet6_hashtables.c
+++ linus-kernel.git/net/ipv6/inet6_hashtables.c
@@ -27,6 +27,8 @@ void __inet6_hash(struct inet_hashinfo *
{
struct hlist_head *list;
rwlock_t *lock;
+ unsigned long *bitmask = NULL;
+ unsigned int index = 0;
BUG_TRAP(sk_unhashed(sk));
@@ -35,15 +37,16 @@ void __inet6_hash(struct inet_hashinfo *
lock = &hashinfo->lhash_lock;
inet_listen_wlock(hashinfo);
} else {
- unsigned int hash;
- sk->sk_hash = hash = inet6_sk_ehashfn(sk);
- hash &= (hashinfo->ehash_size - 1);
- list = &hashinfo->ehash[hash].chain;
- lock = &hashinfo->ehash[hash].lock;
+ sk->sk_hash = inet6_sk_ehashfn(sk);
+ index = inet_ehash_index(hashinfo, sk->sk_hash);
+ list = &hashinfo->ehash[index].chain;
+ lock = &hashinfo->ehash[index].lock;
+ bitmask = hashinfo->ebitmask;
write_lock(lock);
}
__sk_add_node(sk, list);
+ __inet_hash_setbit(bitmask, index);
sock_prot_inc_use(sk->sk_prot);
write_unlock(lock);
}
-
To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html