Re: [PATCH] Can not send icmp netunreach packet
On Tue, Feb 26, 2008 at 06:59:08PM +0900, Wei Yongjun wrote: Jarek Poplawski wrote: Maybe ip_error() does not handle the ESRCH error. In this place ESRCH eq to ENETUNREACH? It doesn't handle ESRCH for sure... Current solution seems to expect it is changed earlier to ENETUNREACH. It looks reasonable because otherwise all other places checking for this should be updated too. But, IMHO, it could be tested if such a change here helps in current problem, and then maybe found where it was skipped? On the other hand, probably checking with grep for all such ENETUNREACH cases, and adding ESRCH where needed could be much simpler and safer... Jarek P. static int ip_error(struct sk_buff *skb) { struct rtable *rt = (struct rtable*)skb-dst; unsigned long now; int code; switch (rt-u.dst.error) { case EINVAL: default: goto out; case EHOSTUNREACH: code = ICMP_HOST_UNREACH; break; case ENETUNREACH: code = ICMP_NET_UNREACH; break; case EACCES: code = ICMP_PKT_FILTERED; break; } ...snip } On 26-02-2008 07:34, Li Yewang wrote: Hi All There is a bug about icmp netunreach. If the kernel does not find a route for a packet,it must send a icmp netunreach packet to the source host,and discard the packet. But the kernel does not senda icmp netunreach packet because of the fib_lookup return value of -ESRCH when a route is not found. ...or because some function doesn't handle -ESRCH return from fib_lookup? It seems changing this to -ESRCH was needed in some cases. And you don't explain enough why it can't be handled later (like in ipv4/route.c: ip_route_input_slow)? Regards, Jarek P. Signed-off-by: Li Yewang [EMAIL PROTECTED] diff -Nurp net/core_back/fib_rules.c net/core/fib_rules.c --- net/core_back/fib_rules.c 2008-02-25 13:15:37.0 +0800 +++ net/core/fib_rules.c2008-02-25 13:16:01.0 +0800 @@ -188,7 +188,7 @@ jumped: } } - err = -ESRCH; + err = -ENETUNREACH; out: rcu_read_unlock(); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Tue, Feb 26, 2008 at 12:14:26PM +, James Chapman wrote: Jarek Poplawski wrote: Jarek Poplawski wrote, On 02/25/2008 02:39 PM: ... Hmm... Wait a minute! But on the other hand David has written about his cons here, and it looks reasonable: this place would be fixed, but some others can start reports like this. Maybe, it's better to analyze yet if it's really so hard to eliminate taking this lock on the xmit path? James, I wonder if you could try to test this patch below? ip_queue_xmit() seems to do proper things with __sk_dst_check(), and if some other functions don't behave similarly lockdep should tell. I think, you could test it with your locks to _bh patch (without pppol2tp_xmit() part), and maybe with my sock.c lockdep patch (it should help lockdep to see these locks a bit more distinctly). I found the same thing and was running a variant of your patch last night; rather than set skb-dst to NULL though, I use __sk_dst_get() and let ip_queue_xmit() do the route lookup if it returns NULL. But this has the same symptoms as the code I tried a few days ago - no lockdep errors but a system lockup after up to several hours. Nothing is logged in the syslog. I guess you are going to try this together with this sk_dst_lock with bh patch too. If it's possible I'd suggest to try this skb-dst = NULL as well (__sk_dst_get instead of __sk_dst_check seems to be too racy). Luckily, I'm in the lab where my two borrowed servers are today so I have access to their consoles. Hopefully I'll be able to find out why there are hanging. Btw, they don't hang if I disable irqs around the ppp_input() call. ...and disabling bh instead isn't enough, BTW? Will update you later. Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Tue, Feb 26, 2008 at 01:03:34PM +, Jarek Poplawski wrote: On Tue, Feb 26, 2008 at 12:14:26PM +, James Chapman wrote: ... there are hanging. Btw, they don't hang if I disable irqs around the ppp_input() call. ...and disabling bh instead isn't enough, BTW? I guess not: they are mostly disabled by ppp_input() itself... So, it looks like a network card could mess here? Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
James Chapman wrote, On 02/26/2008 01:14 PM: ... Luckily, I'm in the lab where my two borrowed servers are today so I have access to their consoles. Hopefully I'll be able to find out why there are hanging. Btw, they don't hang if I disable irqs around the ppp_input() call. Maybe you've found the same, or there is some other reason yet, but IMHO this locking break around ppp_input() is wrong. Probably there is needed more advanced solution, but this should fix the problem if it really exists (isn't there possible a race e.g. between receive from socket and from network card?). Jarek P. --- drivers/net/pppol2tp.c | 16 1 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/net/pppol2tp.c b/drivers/net/pppol2tp.c index e0b072d..7c6fcb9 100644 --- a/drivers/net/pppol2tp.c +++ b/drivers/net/pppol2tp.c @@ -363,18 +363,17 @@ out: spin_unlock(session-reorder_q.lock); } -/* Dequeue a single skb. +/* Requeue a single skb. */ -static void pppol2tp_recv_dequeue_skb(struct pppol2tp_session *session, struct sk_buff *skb) +static void pppol2tp_recv_requeue_skb(struct pppol2tp_session *session, struct sk_buff *skb) { struct pppol2tp_tunnel *tunnel = session-tunnel; int length = PPPOL2TP_SKB_CB(skb)-length; struct sock *session_sock = NULL; - /* We're about to requeue the skb, so unlink it and return resources + /* We're about to requeue the skb, so return resources * to its current owner (a socket receive buffer). */ - skb_unlink(skb, session-reorder_q); skb_orphan(skb); tunnel-stats.rx_packets++; @@ -436,14 +435,14 @@ static void pppol2tp_recv_dequeue_skb(struct pppol2tp_session *session, struct s static void pppol2tp_recv_dequeue(struct pppol2tp_session *session) { struct sk_buff *skb; - struct sk_buff *tmp; /* If the pkt at the head of the queue has the nr that we * expect to send up next, dequeue it and any other * in-sequence packets behind it. */ +again: spin_lock(session-reorder_q.lock); - skb_queue_walk_safe(session-reorder_q, skb, tmp) { + skb_queue_walk(session-reorder_q, skb) { if (time_after(jiffies, PPPOL2TP_SKB_CB(skb)-expires)) { session-stats.rx_seq_discards++; session-stats.rx_errors++; @@ -469,9 +468,10 @@ static void pppol2tp_recv_dequeue(struct pppol2tp_session *session) goto out; } } + __skb_unlink(skb, session-reorder_q); spin_unlock(session-reorder_q.lock); - pppol2tp_recv_dequeue_skb(session, skb); - spin_lock(session-reorder_q.lock); + pppol2tp_recv_requeue_skb(session, skb); + goto again; } out: -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: circular locking, mirred, 2.6.24.2
On 24-02-2008 23:20, Denys Fedoryshchenko wrote: 2.6.24.2 with applied patches for printk,softlockup, and patch for htb (as i understand, they are in 2.6.25 git and it is fixes). I will send also to private mails QoS rules i am using. [ 118.840072] === [ 118.840158] [ INFO: possible circular locking dependency detected ] [ 118.840203] 2.6.24.2-build-0022 #7 [ 118.840243] --- [ 118.840288] swapper/0 is trying to acquire lock: [ 118.840329] (dev-queue_lock){-+..}, at: [c0289c11] dev_queue_xmit +0x177/0x302 [ 118.840490] [ 118.840490] but task is already holding lock: [ 118.840567] (p-tcfc_lock){-+..}, at: [f89f0066] tcf_mirred+0x20/0x180 [act_mirred] [ 118.840727] [ 118.840727] which lock already depends on the new lock. [ 118.840728] [ 118.840842] [ 118.840842] the existing dependency chain (in reverse order) is: [ 118.840921] [ 118.840921] - #2 (p-tcfc_lock){-+..}: [ 118.841075][c0143624] __lock_acquire+0xa30/0xc19 [ 118.841324][c0143887] lock_acquire+0x7a/0x94 [ 118.841572][c02d93f5] _spin_lock+0x2e/0x58 [ 118.841820][f89f0066] tcf_mirred+0x20/0x180 [act_mirred] [ 118.842068][c0297ec4] tcf_action_exec+0x44/0x77 [ 118.842344][f89ba5d2] u32_classify+0x119/0x24a [cls_u32] [ 118.842595][c0295ce2] tc_classify_compat+0x2f/0x5e [ 118.842845][c0295ec3] tc_classify+0x1a/0x80 [ 118.843092][f899c118] ingress_enqueue+0x1a/0x53 [sch_ingress] [ 118.843343][c0287139] netif_receive_skb+0x296/0x44c [ 118.843592][f88d1a4e] e100_poll+0x14b/0x26a [e100] [ 118.843843][c02894bc] net_rx_action+0xbf/0x201 [ 118.844091][c012ac15] __do_softirq+0x6f/0xe9 [ 118.844343][c01078af] do_softirq+0x61/0xc8 [ 118.844591][] 0x [ 118.844840] [ 118.844840] - #1 (dev-ingress_lock){-+..}: [ 118.844993][c0143624] __lock_acquire+0xa30/0xc19 [ 118.845242][c0143887] lock_acquire+0x7a/0x94 [ 118.845489][c02d93f5] _spin_lock+0x2e/0x58 [ 118.845737][c0295387] qdisc_lock_tree+0x1e/0x21 [ 118.845984][c02953b6] dev_init_scheduler+0xb/0x53 [ 118.846235][c0288483] register_netdevice+0x2a3/0x2fd [ 118.846483][c028850f] register_netdev+0x32/0x3f [ 118.846730][c03ea8ee] loopback_net_init+0x39/0x6c [ 118.846980][c02858ef] register_pernet_operations+0x13/0x15 [ 118.847230][c0285958] register_pernet_device+0x1f/0x4c [ 118.847478][c03ea8b3] loopback_init+0xd/0xf [ 118.847725][c03d64df] kernel_init+0x155/0x2c6 This looks strange: are you sure your tc scripts aren't started too early? (Or maybe there are some problems during booting?) Regards, Jarek P. [ 118.847973][c0105bab] kernel_thread_helper+0x7/0x10 [ 118.848225][] 0x [ 118.848472] [ 118.848472] - #0 (dev-queue_lock){-+..}: [ 118.848626][c0143514] __lock_acquire+0x920/0xc19 [ 118.848874][c0143887] lock_acquire+0x7a/0x94 [ 118.849122][c02d93f5] _spin_lock+0x2e/0x58 [ 118.849370][c0289c11] dev_queue_xmit+0x177/0x302 [ 118.849617][f89f01a5] tcf_mirred+0x15f/0x180 [act_mirred] [ 118.849866][c0297ec4] tcf_action_exec+0x44/0x77 [ 118.850114][f89ba5d2] u32_classify+0x119/0x24a [cls_u32] [ 118.850366][c0295ce2] tc_classify_compat+0x2f/0x5e [ 118.850614][c0295ec3] tc_classify+0x1a/0x80 [ 118.850861][f899c118] ingress_enqueue+0x1a/0x53 [sch_ingress] [ 118.85][c0287139] netif_receive_skb+0x296/0x44c [ 118.851360][f88d1a4e] e100_poll+0x14b/0x26a [e100] [ 118.851612][c02894bc] net_rx_action+0xbf/0x201 [ 118.851859][c012ac15] __do_softirq+0x6f/0xe9 [ 118.852106][c01078af] do_softirq+0x61/0xc8 [ 118.852355][] 0x ... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: circular locking, mirred, 2.6.24.2
On Mon, Feb 25, 2008 at 12:48:38PM +0200, Denys Fedoryshchenko wrote: What does it mean early? I have custom boot scripts, it is also custom system based on busybox. There is a chance that i forgot to bring ifb0 up, but thats it. I think such warning must not appear on any actions in userspace. It's not about ifb0: this report shows loopback_init after some action on eth, so eth was probably up before lo. And of course you are right: this warning shouldn't be there. But, since this report looks very strange, I wonder if there could be something else that mislead lockdep. Could you try to reproduce this with 2.6.24.2 without these additional patches? Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 25, 2008 at 12:19:50PM +, James Chapman wrote: Jarek Poplawski wrote: Jarek Poplawski wrote, On 02/21/2008 01:08 PM: ... Another, probably simpler way would be to move almost all pppol2tp_xmit ... Actually, the simplest off all seems to be now this old idea to maybe make sk_dst_lock globally softirq immune. At least I think it's worth of testing, to check for these other possible lockdep warnings. It should only need to change all write_ and read_lock(sk-sk_dst_lock) in very few places: include/net/sock.h, include/net/ip6_route.h, and net/ipv6/ipv6_sockglue.c. This could be tested together with you full _bh locking patch (maybe except these other changes in pppol2tp_xmit). I did this and all lockdep errors have now gone. Tests ran all weekend. See attached patch. Is this an acceptable solution? If so, I'll prepare and send official patches. IMHO this should be acceptable because I can't see any reason for changing properly working code if there is so simple and not costly solution. But maybe David or somebody else finds some cons? Since this patch isn't very big I think you could try to send this officially and we will simply see... Regards, Jarek P. -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development Index: linux-2.6.24.2/include/net/ip6_route.h === --- linux-2.6.24.2.orig/include/net/ip6_route.h +++ linux-2.6.24.2/include/net/ip6_route.h @@ -150,9 +150,9 @@ static inline void __ip6_dst_store(struc static inline void ip6_dst_store(struct sock *sk, struct dst_entry *dst, struct in6_addr *daddr, struct in6_addr *saddr) { - write_lock(sk-sk_dst_lock); + write_lock_bh(sk-sk_dst_lock); __ip6_dst_store(sk, dst, daddr, saddr); - write_unlock(sk-sk_dst_lock); + write_unlock_bh(sk-sk_dst_lock); } static inline int ipv6_unicast_destination(struct sk_buff *skb) Index: linux-2.6.24.2/include/net/sock.h === --- linux-2.6.24.2.orig/include/net/sock.h +++ linux-2.6.24.2/include/net/sock.h @@ -1058,11 +1058,11 @@ sk_dst_get(struct sock *sk) { struct dst_entry *dst; - read_lock(sk-sk_dst_lock); + read_lock_bh(sk-sk_dst_lock); dst = sk-sk_dst_cache; if (dst) dst_hold(dst); - read_unlock(sk-sk_dst_lock); + read_unlock_bh(sk-sk_dst_lock); return dst; } @@ -1079,9 +1079,9 @@ __sk_dst_set(struct sock *sk, struct dst static inline void sk_dst_set(struct sock *sk, struct dst_entry *dst) { - write_lock(sk-sk_dst_lock); + write_lock_bh(sk-sk_dst_lock); __sk_dst_set(sk, dst); - write_unlock(sk-sk_dst_lock); + write_unlock_bh(sk-sk_dst_lock); } static inline void @@ -1097,9 +1097,9 @@ __sk_dst_reset(struct sock *sk) static inline void sk_dst_reset(struct sock *sk) { - write_lock(sk-sk_dst_lock); + write_lock_bh(sk-sk_dst_lock); __sk_dst_reset(sk); - write_unlock(sk-sk_dst_lock); + write_unlock_bh(sk-sk_dst_lock); } extern struct dst_entry *__sk_dst_check(struct sock *sk, u32 cookie); Index: linux-2.6.24.2/net/ipv6/ipv6_sockglue.c === --- linux-2.6.24.2.orig/net/ipv6/ipv6_sockglue.c +++ linux-2.6.24.2/net/ipv6/ipv6_sockglue.c @@ -440,9 +440,9 @@ static int do_ipv6_setsockopt(struct soc opt = xchg(np-opt, opt); sk_dst_reset(sk); } else { - write_lock(sk-sk_dst_lock); + write_lock_bh(sk-sk_dst_lock); opt = xchg(np-opt, opt); - write_unlock(sk-sk_dst_lock); + write_unlock_bh(sk-sk_dst_lock); sk_dst_reset(sk); } sticky_done: @@ -504,9 +504,9 @@ update: opt = xchg(np-opt, opt); sk_dst_reset(sk); } else { - write_lock(sk-sk_dst_lock); + write_lock_bh(sk-sk_dst_lock); opt = xchg(np-opt, opt); - write_unlock(sk-sk_dst_lock); + write_unlock_bh(sk-sk_dst_lock); sk_dst_reset(sk); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 25, 2008 at 01:05:08PM +, Jarek Poplawski wrote: ... On Mon, Feb 25, 2008 at 12:19:50PM +, James Chapman wrote: Is this an acceptable solution? If so, I'll prepare and send official patches. IMHO this should be acceptable because I can't see any reason for changing properly working code if there is so simple and not costly solution. But maybe David or somebody else finds some cons? [...] Hmm... Wait a minute! But on the other hand David has written about his cons here, and it looks reasonable: this place would be fixed, but some others can start reports like this. Maybe, it's better to analyze yet if it's really so hard to eliminate taking this lock on the xmit path? Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 25, 2008 at 01:39:48PM +, Jarek Poplawski wrote: ... Maybe, it's better to analyze yet if it's really so hard to eliminate taking this lock on the xmit path? BTW, I'm not sure if it helps, but this matters only for the sockets which could be used (and locked) outside of pppol2tp code (so before pppol2tp code is called). Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
Jarek Poplawski wrote, On 02/25/2008 02:39 PM: ... Hmm... Wait a minute! But on the other hand David has written about his cons here, and it looks reasonable: this place would be fixed, but some others can start reports like this. Maybe, it's better to analyze yet if it's really so hard to eliminate taking this lock on the xmit path? James, I wonder if you could try to test this patch below? ip_queue_xmit() seems to do proper things with __sk_dst_check(), and if some other functions don't behave similarly lockdep should tell. I think, you could test it with your locks to _bh patch (without pppol2tp_xmit() part), and maybe with my sock.c lockdep patch (it should help lockdep to see these locks a bit more distinctly). Jarek P. PS: Since ppp_generic isn't endangered for now I removed Paul from CC. --- diff --git a/drivers/net/pppol2tp.c b/drivers/net/pppol2tp.c index e0b072d..b94659a 100644 --- a/drivers/net/pppol2tp.c +++ b/drivers/net/pppol2tp.c @@ -1058,7 +1058,7 @@ static int pppol2tp_xmit(struct ppp_channel *chan, struct sk_buff *skb) /* Get routing info from the tunnel socket */ dst_release(skb-dst); - skb-dst = sk_dst_get(sk_tun); + skb-dst = NULL; skb_orphan(skb); skb-sk = sk_tun; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Can not send icmp netunreach packet
On 26-02-2008 07:34, Li Yewang wrote: Hi All There is a bug about icmp netunreach. If the kernel does not find a route for a packet, it must send a icmp netunreach packet to the source host, and discard the packet. But the kernel does not send a icmp netunreach packet because of the fib_lookup return value of -ESRCH when a route is not found. ...or because some function doesn't handle -ESRCH return from fib_lookup? It seems changing this to -ESRCH was needed in some cases. And you don't explain enough why it can't be handled later (like in ipv4/route.c: ip_route_input_slow)? Regards, Jarek P. Signed-off-by: Li Yewang [EMAIL PROTECTED] diff -Nurp net/core_back/fib_rules.c net/core/fib_rules.c --- net/core_back/fib_rules.c 2008-02-25 13:15:37.0 +0800 +++ net/core/fib_rules.c2008-02-25 13:16:01.0 +0800 @@ -188,7 +188,7 @@ jumped: } } - err = -ESRCH; + err = -ENETUNREACH; out: rcu_read_unlock(); -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG][AX25] spinlock lockup
On Sun, Feb 24, 2008 at 04:10:29AM +0100, Jann Traschewski wrote: Hello, Hi! I got a spinlock lockup using the latest Kernel 2.6.24.2 with recent patches from Jarek Poplawski applied. ... ppp_deflate nf_nat zlib_deflateBUG: unable to handle kernel NULL pointer dereference zlib_inflate nf_conntrack_ipv4 bsd_comp slhc ppp_async xt_state ... EIP is at skb_append+0x1b/0x30 ... 0010 BUG: spinlock lockup on CPU#1, bcm/12213, f40846b8 Looks like 2 in 1: NULL pointer dereference and (later?) lockup. There is only one function in AX25 calling skb_append(), and it really looks suspicious: appends skb after previously enqueued one, but in the meantime this previous skb could be removed from the queue. Here is a patch for testing: it fixes this simple way, so this is not fully compatible with the current method, but let's check if this could be a problem? Regards, Jarek P. (testing patch #1) --- net/ax25/ax25_subr.c | 11 +++ 1 files changed, 3 insertions(+), 8 deletions(-) diff --git a/net/ax25/ax25_subr.c b/net/ax25/ax25_subr.c index d8f2157..034aa10 100644 --- a/net/ax25/ax25_subr.c +++ b/net/ax25/ax25_subr.c @@ -64,20 +64,15 @@ void ax25_frames_acked(ax25_cb *ax25, unsigned short nr) void ax25_requeue_frames(ax25_cb *ax25) { - struct sk_buff *skb, *skb_prev = NULL; + struct sk_buff *skb; /* * Requeue all the un-ack-ed frames on the output queue to be picked * up by ax25_kick called from the timer. This arrangement handles the * possibility of an empty output queue. */ - while ((skb = skb_dequeue(ax25-ack_queue)) != NULL) { - if (skb_prev == NULL) - skb_queue_head(ax25-write_queue, skb); - else - skb_append(skb_prev, skb, ax25-write_queue); - skb_prev = skb; - } + while ((skb = skb_dequeue_tail(ax25-ack_queue)) != NULL) + skb_queue_head(ax25-write_queue, skb); } /* -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][NET] sock.c: sk_dst_lock lockdep keys and names per af_family
On Thu, Feb 21, 2008 at 09:53:56AM +, James Chapman wrote: ... The lockups still happen, but I think they are now due to a different problem, as you say. ... I hope, this patch should help to remove some possibly false lockdep warnings during this current testing. Regards, Jarek P. --- Subject: [NET] sock.c: sk_dst_lock lockdep keys and names per af_family Initialize sk_dst_lock lockdep keys and names per af_family. Additionally some reorder is done in lockdep related code to keep it in one place and to use static key tables only when needed. Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] [not tested] --- net/core/sock.c | 79 +++ 1 files changed, 56 insertions(+), 23 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 433715f..18c33d2 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -131,14 +131,17 @@ #include net/tcp.h #endif +#ifdef CONFIG_DEBUG_LOCK_ALLOC + /* - * Each address family might have different locking rules, so we have - * one slock key per address family: + * Each address family might have different locking rules, so we have one: + * sk_lock, slock, sk_callback_lock and sk_dst_lock key per address family: */ static struct lock_class_key af_family_keys[AF_MAX]; static struct lock_class_key af_family_slock_keys[AF_MAX]; +static struct lock_class_key af_family_callback_keys[AF_MAX]; +static struct lock_class_key af_family_dst_keys[AF_MAX]; -#ifdef CONFIG_DEBUG_LOCK_ALLOC /* * Make lock validator output more readable. (we pre-construct these * strings build-time, so that runtime initialization of socket @@ -186,13 +189,52 @@ static const char *af_family_clock_key_strings[AF_MAX+1] = { clock-AF_TIPC , clock-AF_BLUETOOTH, clock-AF_IUCV , clock-AF_RXRPC , clock-AF_MAX }; -#endif +static const char *af_family_dst_key_strings[AF_MAX+1] = { + sk_dst-AF_UNSPEC, sk_dst-AF_UNIX , sk_dst-AF_INET , + sk_dst-AF_AX25 , sk_dst-AF_IPX , sk_dst-AF_APPLETALK, + sk_dst-AF_NETROM, sk_dst-AF_BRIDGE , sk_dst-AF_ATMPVC , + sk_dst-AF_X25 , sk_dst-AF_INET6, sk_dst-AF_ROSE , + sk_dst-AF_DECnet, sk_dst-AF_NETBEUI , sk_dst-AF_SECURITY , + sk_dst-AF_KEY , sk_dst-AF_NETLINK , sk_dst-AF_PACKET , + sk_dst-AF_ASH , sk_dst-AF_ECONET , sk_dst-AF_ATMSVC , + sk_dst-21 , sk_dst-AF_SNA , sk_dst-AF_IRDA , + sk_dst-AF_PPPOX , sk_dst-AF_WANPIPE , sk_dst-AF_LLC , + sk_dst-27 , sk_dst-28 , sk_dst-29 , + sk_dst-AF_TIPC , sk_dst-AF_BLUETOOTH, sk_dst-AF_IUCV , + sk_dst-AF_RXRPC , sk_dst-AF_MAX +}; -/* - * sk_callback_lock locking rules are per-address-family, - * so split the lock classes by using a per-AF key: - */ -static struct lock_class_key af_callback_keys[AF_MAX]; + +static inline void sock_lock_init(struct sock *sk) +{ + sock_lock_init_class_and_name(sk, + af_family_slock_key_strings[sk-sk_family], + af_family_slock_keys + sk-sk_family, + af_family_key_strings[sk-sk_family], + af_family_keys + sk-sk_family); +} + +#define lockdep_set_sk_callback_lock(lock, family) \ + lockdep_set_class_and_name(lock, \ + af_family_callback_keys + (family), \ + af_family_clock_key_strings[(family)]) + +#define lockdep_set_sk_dst_lock(lock, family) \ + lockdep_set_class_and_name(lock, \ + af_family_dst_keys + (family), \ + af_family_dst_key_strings[(family)]) + +#else + +static inline void sock_lock_init(struct sock *sk) +{ + sock_lock_init_class_and_name(sk, 0, 0, 0, 0); +} + +#define lockdep_set_sk_callback_lock(lock, family) do {} while (0) +#define lockdep_set_sk_dst_lock(lock, family) do {} while (0) + +#endif /* CONFIG_DEBUG_LOCK_ALLOC */ /* Take into consideration the size of the struct sk_buff overhead in the * determination of these values, since that is non-constant across @@ -866,14 +908,6 @@ lenout: * * (We also register the sk_lock with the lock validator.) */ -static inline void sock_lock_init(struct sock *sk) -{ - sock_lock_init_class_and_name(sk, - af_family_slock_key_strings[sk-sk_family], - af_family_slock_keys + sk-sk_family, - af_family_key_strings[sk-sk_family], - af_family_keys + sk-sk_family); -} static void sock_copy(struct sock *nsk, const struct sock *osk) { @@ -1014,10 +1048,10 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority) #endif rwlock_init(newsk-sk_dst_lock); + lockdep_set_sk_dst_lock(newsk-sk_dst_lock, newsk-sk_family); rwlock_init(newsk-sk_callback_lock); - lockdep_set_class_and_name(newsk
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Wed, Feb 20, 2008 at 10:37:57PM +, James Chapman wrote: Jarek Poplawski wrote: (testing patch #1) But I hope you tested with the fixed (take 2) version of this patch... Yes I did. :) But I just got another lockdep error (attached). Since it's quite experimental (testing) this patch could be wrong as it is, but I hope it should show the proper way to solve this problem. Probably you did some of these, but here are a few of my suggestions for testing this: 1) try my patch with your full bh locking changing patch; 2) add while loops to these trylocks on failure, with e.g. __delay(1); this should work like full locks again, but there should be no (this kind of) lockdep reports; Hmm, isn't this just bypassing the lockdep checks? Yes! But it's only for debugging: to find if this change in locking is to be blamed for these new lockups. It should effectively work just like without this patch, but without this lockdep warning. So, if after such change lockups still happen, then it would seem you didn't test this enough before. Otherwise the new patch is to blame and needs reworking. 3) I send here another testing patch with this second way to do this: on the write side, but it's even more experimental and only a proof of concept (should be applied on vanilla ppp_generic). I'll look over it. I think I need to take a step back and look at what's happening in more detail though. This is something completely new and changes all the picture: the xmit path wasn't expected (at least by me) to be called in softirq context at all, and there were no traces of this on previous reports. But, since lockdep always stops after the first warning, there could be even more surprises like this in the future. I'll check this report. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Thu, Feb 21, 2008 at 09:53:56AM +, James Chapman wrote: Jarek Poplawski wrote: ... The _bh locking fixes in pppol2tp combined with your ppp_generic change solved that problem. So I then added data traffic into the mix (since this will happen in a real network) and found that lockups still happen. But the lockdep trace in this case is different, as you noted. I'm not sure what do you mean by solved that problem: lack of lockups or lack of this kind of lockdep reports. This lockdep report shows a real danger in this case, probably very little probable, unless a lot of tries. So if you think just this kind of lockup happend there, this is would be nice. But there could be something less nice too: we are fighting with lockdep in one place but the lockup happens somewhere else for some totally different reason... Does PPPoE stress the PPP setup code as much as this scenario? I guess in theory it could if lots of PPPoE clients connected at the same time, but there is no aggregate tunnel like there is with L2TP to cause all sessions to connect simultaneously. Perhaps PPTP also suffers from these issues? Perhaps not because it tends to be used only in VPN setups where there is only 1 session per tunnel. I don't know this code enough, but it seems it should be easier to maintain or debug if there are similar solutions where possible. 3) I send here another testing patch with this second way to do this: on the write side, but it's even more experimental and only a proof of concept (should be applied on vanilla ppp_generic). I'll look over it. I think I need to take a step back and look at what's happening in more detail though. This is something completely new and changes all the picture: the xmit path wasn't expected (at least by me) to be called in softirq context at all, and there were no traces of this on previous reports. But, since lockdep always stops after the first warning, there could be even more surprises like this in the future. I'll check this report. Doesn't the TX softirq do transmits if they've been queued up? I've probably too much looked at these reports, and should've expected this could happen. Probably queueing could be separated, but since there could be no queue at all and it's done like this, then this current proof of concept seems to be dead end, and we have to go back to fixing sk_dst_lock handling and my patches could be dumped... So if I don't miss something again (and I need more time for this new report) you should try to fix the problem not reported originally by lockdep, but forseen by David(!): we need to avoid any path like: ppp_generic - pppol2tp - something, which could take sk_dst_lock while holding any ppp_generic writing lock: they all are softirq safe (i.e. endangered). David gave some example, but I'm not sure you did your patch like this (sk_dst_set()). Probably ip_queue_xmit() can't work with this too. Another, probably simpler way would be to move almost all pppol2tp_xmit code to a workqueue: this should let to break most of dependencies with ppp_generic locks, but I don't know how much it would affect other things (e.g. performance). So you should really rethink these things. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
Jarek Poplawski wrote, On 02/21/2008 01:08 PM: ... Another, probably simpler way would be to move almost all pppol2tp_xmit ... Actually, the simplest off all seems to be now this old idea to maybe make sk_dst_lock globally softirq immune. At least I think it's worth of testing, to check for these other possible lockdep warnings. It should only need to change all write_ and read_lock(sk-sk_dst_lock) in very few places: include/net/sock.h, include/net/ip6_route.h, and net/ipv6/ipv6_sockglue.c. This could be tested together with you full _bh locking patch (maybe except these other changes in pppol2tp_xmit). Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: Question about polling
On Wed, Feb 20, 2008 at 12:25:32PM +0300, Badalian Vyacheslav wrote: ... Please not think that it disrespect for you. Its simple language barrier =( OK! Don't disrespect for me - I'll try fix my English next time!) Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: Question about polling
On Wed, Feb 20, 2008 at 02:54:27PM +0300, Badalian Vyacheslav wrote: Khrm i try to say that i have language barrier and some time may wrong compose clauses. Example below =) No, only a bit joking... I'll try fix my English next time! Don't worry Vyacheslav: I think your message was understandable enough if you got good answer from Jesse. (And I've learned something BTW too; Thanks Jesse!) And after all it's not a language group: we care here for serious problems! Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Wed, Feb 20, 2008 at 04:02:52PM +, James Chapman wrote: ... I tried your ppp_generic patch with only the hlist_lock bh patch in pppol2tp and it seems to fix the ppp create/delete issue. However, when I added much more traffic into the test (flood pings over ppp interfaces while repeatedly creating/deleting the L2TP (PPP) sessions) I get a soft lockup detected in pppol2tp_xmit() after anything between 1 minute and an hour. :( I'm investigating that now. Thanks for your help! Not at all! (testing patch #1) But I hope you tested with the fixed (take 2) version of this patch... Since it's quite experimental (testing) this patch could be wrong as it is, but I hope it should show the proper way to solve this problem. Probably you did some of these, but here are a few of my suggestions for testing this: 1) try my patch with your full bh locking changing patch; 2) add while loops to these trylocks on failure, with e.g. __delay(1); this should work like full locks again, but there should be no (this kind of) lockdep reports; 3) I send here another testing patch with this second way to do this: on the write side, but it's even more experimental and only a proof of concept (should be applied on vanilla ppp_generic). Regards, Jarek P. (testing patch #2) --- drivers/net/ppp_generic.c | 20 +--- 1 files changed, 13 insertions(+), 7 deletions(-) diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c index 4dc5b4b..70bd255 100644 --- a/drivers/net/ppp_generic.c +++ b/drivers/net/ppp_generic.c @@ -2606,11 +2606,16 @@ ppp_connect_channel(struct channel *pch, int unit) ppp = ppp_find_unit(unit); if (!ppp) goto out; - write_lock_bh(pch-upl); + ret = -EINVAL; - if (pch-ppp) - goto outl; + read_lock_bh(pch-upl); + if (pch-ppp) { + read_unlock_bh(pch-upl); + goto out; + } + read_unlock_bh(pch-upl); + atomic_inc(ppp-file.refcnt); ppp_lock(ppp); if (pch-file.hdrlen ppp-file.hdrlen) ppp-file.hdrlen = pch-file.hdrlen; @@ -2619,13 +2624,14 @@ ppp_connect_channel(struct channel *pch, int unit) ppp-dev-hard_header_len = hdrlen; list_add_tail(pch-clist, ppp-channels); ++ppp-n_channels; - pch-ppp = ppp; - atomic_inc(ppp-file.refcnt); ppp_unlock(ppp); - ret = 0; - outl: + /* avoid lock dependency with ppp_locks */ + write_lock_bh(pch-upl); + BUG_ON(pch-ppp); + pch-ppp = ppp; write_unlock_bh(pch-upl); + ret = 0; out: mutex_unlock(all_ppp_mutex); return ret; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Tue, Feb 19, 2008 at 10:30:47AM +, Jarek Poplawski wrote: ... IMHO, just like I wrote earlier, the main problem is in ppp_generic(), ...or maybe ppp_generic.c? Whatever... Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Tue, Feb 19, 2008 at 09:03:12AM +, James Chapman wrote: David Miller wrote: From: James Chapman [EMAIL PROTECTED] Date: Mon, 18 Feb 2008 22:09:24 + Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? Firstly, let's fix one thing at a time. Leave the sk_dst_get() thing alone until we can prove that it's part of the lockdep traces. In reproducing the problem, I obtained several lockdep traces that implicated sk_dst_get(). As a matter of fact I missed just that kind information on previous lockdep report, so if you could send them too this should be still helpful. ... I agree. I'm seeking advice on what the underlying cause is of this new trace. IMHO, just like I wrote earlier, the main problem is in ppp_generic(), especially ppp_connect_channel(), where main tx rx locks are used. I didn't know enough about this sk_dst_lock traces yet. I hope I could help with this, but after these changes I need some time to figure this out again. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 18, 2008 at 10:09:24PM +, James Chapman wrote: ... Unfortunately the ISP's syslog stops. But I've been able to borrow two Quad Xeon boxes and have reproduced the problem. Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? I send here my proposal: it's intended for testing and to check one of possible solutions here. IMHO your lockdep reports show there is no use to change anything around sk_dst_lock: it would need the global change of this lock to fix this problem. So the fix should be done around pch-upl lock and this means changing ppp_generic. In the patch below I've used trylock in places which seem to allow for skipping some things (while config is changed only) or simply don't need this lock because there is no ppp struct. This could be modified to add some waiting loop if necessary. Another option is to change the write side of this lock: it looks like more vulnerable if something missed because there are more locks involved, but probably should be enough to solve this problem too. I think pppol2tp need to be first checked only with hlist_lock bh patch, unless there were some lockdep reports on these other locks too. (BTW, I added ppp maintainer to CC - I hope we get Paul's opinion on this.) Regards, Jarek P. (testing patch #1) --- drivers/net/ppp_generic.c | 33 +++-- 1 files changed, 23 insertions(+), 10 deletions(-) diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c index 4dc5b4b..5cbc534 100644 --- a/drivers/net/ppp_generic.c +++ b/drivers/net/ppp_generic.c @@ -1473,7 +1473,7 @@ void ppp_input(struct ppp_channel *chan, struct sk_buff *skb) { struct channel *pch = chan-ppp; - int proto; + int proto, locked; if (!pch || skb-len == 0) { kfree_skb(skb); @@ -1481,8 +1481,13 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb) } proto = PPP_PROTO(skb); - read_lock_bh(pch-upl); - if (!pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) { + /* +* We use trylock to avoid dependency between soft-irq-safe upl lock +* and soft-irq-unsafe sk_dst_lock. +*/ + local_bh_disable(); + locked = read_trylock(pch-upl); + if (!locked || !pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) { /* put it on the channel queue */ skb_queue_tail(pch-file.rq, skb); /* drop old frames if queue too long */ @@ -1493,7 +1498,10 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb) } else { ppp_do_recv(pch-ppp, skb, pch); } - read_unlock_bh(pch-upl); + + if (locked) + read_unlock(pch-upl); + local_bh_enable(); } /* Put a 0-length skb in the receive queue as an error indication */ @@ -1506,16 +1514,18 @@ ppp_input_error(struct ppp_channel *chan, int code) if (!pch) return; - read_lock_bh(pch-upl); - if (pch-ppp) { + /* a trylock comment in ppp_input() */ + local_bh_disable(); + if (read_trylock(pch-upl) pch-ppp) { skb = alloc_skb(0, GFP_ATOMIC); if (skb) { skb-len = 0; /* probably unnecessary */ skb-cb[0] = code; ppp_do_recv(pch-ppp, skb, pch); } + read_unlock(pch-upl); } - read_unlock_bh(pch-upl); + local_bh_enable(); } /* @@ -2044,10 +2054,13 @@ int ppp_unit_number(struct ppp_channel *chan) int unit = -1; if (pch) { - read_lock_bh(pch-upl); - if (pch-ppp) + /* a trylock comment in ppp_input() */ + local_bh_disable(); + if (read_trylock(pch-upl) pch-ppp) { unit = pch-ppp-file.index; - read_unlock_bh(pch-upl); + read_unlock(pch-upl); + } + local_bh_enable(); } return unit; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Wed, Feb 20, 2008 at 12:06:40AM +0100, Jarek Poplawski wrote: ... (testing patch #1) SORRY!!! take 2 (unlocking fixed) --- drivers/net/ppp_generic.c | 39 +-- 1 files changed, 29 insertions(+), 10 deletions(-) diff --git a/drivers/net/ppp_generic.c b/drivers/net/ppp_generic.c index 4dc5b4b..c4e3808 100644 --- a/drivers/net/ppp_generic.c +++ b/drivers/net/ppp_generic.c @@ -1473,7 +1473,7 @@ void ppp_input(struct ppp_channel *chan, struct sk_buff *skb) { struct channel *pch = chan-ppp; - int proto; + int proto, locked; if (!pch || skb-len == 0) { kfree_skb(skb); @@ -1481,8 +1481,13 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb) } proto = PPP_PROTO(skb); - read_lock_bh(pch-upl); - if (!pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) { + /* +* We use trylock to avoid dependency between soft-irq-safe upl lock +* and soft-irq-unsafe sk_dst_lock. +*/ + local_bh_disable(); + locked = read_trylock(pch-upl); + if (!locked || !pch-ppp || proto = 0xc000 || proto == PPP_CCPFRAG) { /* put it on the channel queue */ skb_queue_tail(pch-file.rq, skb); /* drop old frames if queue too long */ @@ -1493,7 +1498,10 @@ ppp_input(struct ppp_channel *chan, struct sk_buff *skb) } else { ppp_do_recv(pch-ppp, skb, pch); } - read_unlock_bh(pch-upl); + + if (locked) + read_unlock(pch-upl); + local_bh_enable(); } /* Put a 0-length skb in the receive queue as an error indication */ @@ -1502,12 +1510,14 @@ ppp_input_error(struct ppp_channel *chan, int code) { struct channel *pch = chan-ppp; struct sk_buff *skb; + int locked; if (!pch) return; - read_lock_bh(pch-upl); - if (pch-ppp) { + /* a trylock comment in ppp_input() */ + local_bh_disable(); + if ((locked = read_trylock(pch-upl)) pch-ppp) { skb = alloc_skb(0, GFP_ATOMIC); if (skb) { skb-len = 0; /* probably unnecessary */ @@ -1515,7 +1525,10 @@ ppp_input_error(struct ppp_channel *chan, int code) ppp_do_recv(pch-ppp, skb, pch); } } - read_unlock_bh(pch-upl); + + if (locked) + read_unlock(pch-upl); + local_bh_enable(); } /* @@ -2044,10 +2057,16 @@ int ppp_unit_number(struct ppp_channel *chan) int unit = -1; if (pch) { - read_lock_bh(pch-upl); - if (pch-ppp) + int locked; + + /* a trylock comment in ppp_input() */ + local_bh_disable(); + if ((locked = read_trylock(pch-upl)) pch-ppp) unit = pch-ppp-file.index; - read_unlock_bh(pch-upl); + + if (locked) + read_unlock(pch-upl); + local_bh_enable(); } return unit; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [2.6.25 patch] fix broken error handling in ieee80211_sta_process_addba_request()
On 19-02-2008 23:58, Adrian Bunk wrote: ... --- a/net/mac80211/ieee80211_sta.c +++ b/net/mac80211/ieee80211_sta.c @@ -1116,9 +1116,10 @@ static void ieee80211_sta_process_addba_request(struct net_device *dev, ... + printk(KERN_ERR can not allocate reordering buffer + printk(KERN_ERR cannot allocate reordering buffer Probably this can be fixed during the commit. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: e1000: Question about polling
On 18-02-2008 10:18, Badalian Vyacheslav wrote: Hello all. Hi, Interesting think: Have PC that do NAT. Bandwidth about 600 mbs. Have 4 CPU (2xCoRe 2 DUO HT OFF 3.2 HZ). irqbalance in kernel is off. nat2 ~ # cat /proc/irq/217/smp_affinity 0001 nat2 ~ # cat /proc/irq/218/smp_affinity 0003 Load SI on CPU0 and CPU1 is about 90% Good... try do echo /proc/irq/217/smp_affinity echo /proc/irq/218/smp_affinity Get 100% SI at CPU0 Question Why? I think you should show here /proc/interrupts in all these cases. I listen that if use IRQ from 1 netdevice to 1 CPU i can get 30% perfomance... but i have 4 CPU... i must get more perfomance if i cat to smp_affinity. picture looks liks this: 0-3 CPU get over 50% SI bandwith up 55% SI... bandwith up... 100% SI on CPU0 I remember patch to fix problem like it... patched function e1000_clean... kernel on pc have this patch (2.6.24-rc7-git2)... e1000 driver work much better (i up to 1.5-2x bandwidth before i get 100% SI), but i think that it not get 100% that it can =) If some patch works for you, and you can show here its advantages, you should probably add here some link and request for merging. BTW, I wonder if you tried to check if changing CONFIG_HZ makes any difference here? Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][IPROUTE] tc filters usage fixes
A few usage description fixes of tc filters for some minimal consistency (FILTER_KIND because of QDISC_KIND). Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- tc/f_basic.c |4 ++-- tc/f_rsvp.c|2 +- tc/f_u32.c |2 +- tc/tc_filter.c |6 +++--- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/tc/f_basic.c b/tc/f_basic.c index ad41633..cf2650b 100644 --- a/tc/f_basic.c +++ b/tc/f_basic.c @@ -30,8 +30,8 @@ static void explain(void) fprintf(stderr, Usage: ... basic [ match EMATCH_TREE ] [ police POLICE_SPEC ]\n); fprintf(stderr, [ action ACTION_SPEC ] [ classid CLASSID ]\n); fprintf(stderr, \n); - fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n); - fprintf(stderr,FILTERID := X:Y:Z\n); + fprintf(stderr, Where:\n); + fprintf(stderr,CLASSID := X:Y:Z\n); fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n); } diff --git a/tc/f_rsvp.c b/tc/f_rsvp.c index 7e1e6d9..8f92e8f 100644 --- a/tc/f_rsvp.c +++ b/tc/f_rsvp.c @@ -33,7 +33,7 @@ static void explain(void) fprintf(stderr, Where: GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |\n); fprintf(stderr, u{8|16|32} NUMBER mask MASK at OFFSET}\n); fprintf(stderr,POLICE_SPEC := ... look at TBF\n); - fprintf(stderr,FILTERID := X:Y\n); + fprintf(stderr,CLASSID := X:Y\n); fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n); } diff --git a/tc/f_u32.c b/tc/f_u32.c index 9bc4bb5..957b1b1 100644 --- a/tc/f_u32.c +++ b/tc/f_u32.c @@ -38,7 +38,7 @@ static void explain(void) fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n); fprintf(stderr,SAMPLE := { ip | ip6 | udp | tcp | icmp | u{32|16|8} | mark } SAMPLE_ARGS [divisor DIVISOR]\n); - fprintf(stderr,FILTERID := X:Y:Z\n); + fprintf(stderr,CLASSID := X:Y:Z\n); fprintf(stderr, \nNOTE: CLASSID is parsed at hexadecimal input.\n); } diff --git a/tc/tc_filter.c b/tc/tc_filter.c index d70c656..eb74f89 100644 --- a/tc/tc_filter.c +++ b/tc/tc_filter.c @@ -33,12 +33,12 @@ static void usage(void) fprintf(stderr, Usage: tc filter [ add | del | change | replace | show ] dev STRING\n); fprintf(stderr,[ pref PRIO ] protocol PROTO\n); fprintf(stderr,[ estimator INTERVAL TIME_CONSTANT ]\n); - fprintf(stderr,[ root | classid CLASSID ] [ handle FILTERID ]\n); - fprintf(stderr,[ [ FILTER_TYPE ] [ help | OPTIONS ] ]\n); + fprintf(stderr,[ root | parent CLASSID ] [ handle FILTERID ]\n); + fprintf(stderr,[ [ FILTER_KIND ] [ help | OPTIONS ] ]\n); fprintf(stderr, \n); fprintf(stderr,tc filter show [ dev STRING ] [ root | parent CLASSID ]\n); fprintf(stderr, Where:\n); - fprintf(stderr, FILTER_TYPE := { rsvp | u32 | fw | route | etc. }\n); + fprintf(stderr, FILTER_KIND := { rsvp | u32 | fw | route | etc. }\n); fprintf(stderr, FILTERID := ... format depends on classifier, see there\n); fprintf(stderr, OPTIONS := ... try tc filter add desired FILTER_KIND help\n); return; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 18, 2008 at 10:09:24PM +, James Chapman wrote: Jarek Poplawski wrote: Hi, It seems, this nice report is still uncomplete: could you check if there could have been something more yet? Unfortunately the ISP's syslog stops. But I've been able to borrow two Quad Xeon boxes and have reproduced the problem. Here's a new version of the patch. The patch avoids disabling irqs and fixes the sk_dst_get() usage that DaveM mentioned. But even with this patch, lockdep still complains if hundreds of ppp sessions are inserted into a tunnel as rapidly as possible (lockdep trace is below). I can stop these errors by wrapping the call to ppp_input() in pppol2tp_recv_dequeue_skb() with local_irq_save/restore. What is a better fix? Hmm... This is a really long report and quite a bit different from the previous one. I need some time for this. BTW: you sent before a lockdep report with hlist_lock problem. I think this could be fixed in some independent patch to make this all more readable. Are all the other changes in this current patch only because of this or previous lockdep report or for some other reasons (or reports) yet? Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][IPROUTE] tc filters usage fixes
Jarek Poplawski wrote, On 02/18/2008 11:10 PM: A few usage description fixes of tc filters for some minimal consistency (FILTER_KIND because of QDISC_KIND). Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] Don't apply: I've sent 2nd version of this patch. Sorry, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2][IPROUTE] tc filters usage fixes
CLASSID := X:Y:Z == X:Y in f_basic is changed here and no change for f_u32 (it has both CLASSID and FILTERID mentioned). - (take 2) A few usage description fixes of tc filters for some minimal consistency (FILTER_KIND because of QDISC_KIND). Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- tc/f_basic.c |4 ++-- tc/f_rsvp.c|2 +- tc/tc_filter.c |6 +++--- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/tc/f_basic.c b/tc/f_basic.c index ad41633..d8a42d9 100644 --- a/tc/f_basic.c +++ b/tc/f_basic.c @@ -30,8 +30,8 @@ static void explain(void) fprintf(stderr, Usage: ... basic [ match EMATCH_TREE ] [ police POLICE_SPEC ]\n); fprintf(stderr, [ action ACTION_SPEC ] [ classid CLASSID ]\n); fprintf(stderr, \n); - fprintf(stderr, Where: SELECTOR := SAMPLE SAMPLE ...\n); - fprintf(stderr,FILTERID := X:Y:Z\n); + fprintf(stderr, Where:\n); + fprintf(stderr,CLASSID := X:Y\n); fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n); } diff --git a/tc/f_rsvp.c b/tc/f_rsvp.c index 7e1e6d9..8f92e8f 100644 --- a/tc/f_rsvp.c +++ b/tc/f_rsvp.c @@ -33,7 +33,7 @@ static void explain(void) fprintf(stderr, Where: GPI := { flowlabel NUMBER | spi/ah SPI | spi/esp SPI |\n); fprintf(stderr, u{8|16|32} NUMBER mask MASK at OFFSET}\n); fprintf(stderr,POLICE_SPEC := ... look at TBF\n); - fprintf(stderr,FILTERID := X:Y\n); + fprintf(stderr,CLASSID := X:Y\n); fprintf(stderr, \nNOTE: CLASSID is parsed as hexadecimal input.\n); } diff --git a/tc/tc_filter.c b/tc/tc_filter.c index d70c656..eb74f89 100644 --- a/tc/tc_filter.c +++ b/tc/tc_filter.c @@ -33,12 +33,12 @@ static void usage(void) fprintf(stderr, Usage: tc filter [ add | del | change | replace | show ] dev STRING\n); fprintf(stderr,[ pref PRIO ] protocol PROTO\n); fprintf(stderr,[ estimator INTERVAL TIME_CONSTANT ]\n); - fprintf(stderr,[ root | classid CLASSID ] [ handle FILTERID ]\n); - fprintf(stderr,[ [ FILTER_TYPE ] [ help | OPTIONS ] ]\n); + fprintf(stderr,[ root | parent CLASSID ] [ handle FILTERID ]\n); + fprintf(stderr,[ [ FILTER_KIND ] [ help | OPTIONS ] ]\n); fprintf(stderr, \n); fprintf(stderr,tc filter show [ dev STRING ] [ root | parent CLASSID ]\n); fprintf(stderr, Where:\n); - fprintf(stderr, FILTER_TYPE := { rsvp | u32 | fw | route | etc. }\n); + fprintf(stderr, FILTER_KIND := { rsvp | u32 | fw | route | etc. }\n); fprintf(stderr, FILTERID := ... format depends on classifier, see there\n); fprintf(stderr, OPTIONS := ... try tc filter add desired FILTER_KIND help\n); return; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
On Sun, Feb 17, 2008 at 02:03:33AM +0200, Denys Fedoryshchenko wrote: Server is fully redundant now, so i apply patches (but i apply both, probably it will make system more reliable somehow) and i enable required debug options in kernel. So i will try to catch this bug few more times, probably if it will generate more detailed info over netconsole it will be useful. I guess you mean the patches mentioned in the BUG/ spinlock lockup; they could be useful, but we are not sure this is the same problem. Anyway, if there are really stack overflows then we don't need any bug report after this: with stack data corrupted they would show some false problems. We need to find which code overflows and why. If you want to debug this, then try to make this more reproducible e.g. with CONFIG_4KSTACKS; anyway you should always turn on these options with such problems: CONFIG_DEBUG_STACKOVERFLOW CONFIG_DEBUG_STACK_USAGE. Is there any project to dump console messages/kernel dump to disk? For ... I don't know, but there is probably something better: a project by Intel to save this in some cpu memory (or something...). But again: we don't need corrupted messages after stack overflow, and, if we don't let for this, maybe these netconsole messages would be properly printed and quite enough... I notice some code in MTD(CONFIG_MTD_OOPS), but i am not sure it is correct and will work if i will setup MTD emulation for block device. I'm not sure what do you mean by MTD emulation: it should be used with MTD devices only, I presume? Regards, Jarek P. PS: BTW, for HTB with actions I recommend my sch_htb: htb_requeue fix, available in 2.6.25-rc. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
On Sat, Feb 16, 2008 at 12:25:31PM +0200, Denys Fedoryshchenko wrote: Thanks, i will try it. You think lockdep can be buggy? Just like every code... But the main reason is it has quite meaningful overhead, so could be right in production only after lockups happen. But if it doesn't report anything anyway... Your report shows there are quite long paths of calls during softirqs with some actions (ipt + mirred here?) and qdiscs, so if I'm not wrong with this stack problem, this would need some optimization. And, of course, there could be some additional bugs involved around too: otherwise it seems this should happen more often. But I don't expect you would try to debug this on your servers, so I hope, it simply will be found BTW some day... Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG/ spinlock lockup, 2.6.24
Jarek Poplawski wrote, On 02/15/2008 09:21 PM: Denys Fedoryshchenko wrote, On 02/15/2008 08:42 PM: ... I have similar crashes on completely different hardware with same job (QOS), so i think it is actually some nasty bug in networking. Maybe you could try with some other debugging options? E.g. since lockdep doesn't help - turn this off. Instead try some others, like these: ...On the other hand this: Feb 15 15:50:17 217.151.X.X [1521315.068984] BUG: spinlock lockup on CPU#1, ksoftirqd/1/7, f0551180 seems to point just at spinlock lockup, so it's more about the full report. I wonder if this patch to prink could help here: author Ingo Molnar mingo at elte.hu Fri, 25 Jan 2008 20:07:58 + (21:07 +0100) printk: make printk more robust by not allowing recursion http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=32a76006683f7b28ae3cc491da37716e002f198e Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG/ spinlock lockup, 2.6.24
Jarek Poplawski wrote, On 02/15/2008 10:03 PM: ... ...On the other hand this: Feb 15 15:50:17 217.151.X.X [1521315.068984] BUG: spinlock lockup on CPU#1, ksoftirqd/1/7, f0551180 seems to point just at spinlock lockup, so it's more about the full report. I wonder if this patch to prink could help here: authorIngo Molnar mingo at elte.hu Fri, 25 Jan 2008 20:07:58 + (21:07 +0100) printk: make printk more robust by not allowing recursion http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=32a76006683f7b28ae3cc491da37716e002f198e ...or maybe a patch like this attached here? Jarek P. diff --git a/lib/spinlock_debug.c b/lib/spinlock_debug.c index 9c4b025..21c8aaa 100644 --- a/lib/spinlock_debug.c +++ b/lib/spinlock_debug.c @@ -111,8 +111,7 @@ static void __spin_lock_debug(spinlock_t *lock) __delay(1); } /* lockup suspected: */ - if (print_once) { - print_once = 0; + if (print_once == 1) { printk(KERN_EMERG BUG: spinlock lockup on CPU#%d, %s/%d, %p\n, raw_smp_processor_id(), current-comm, @@ -122,7 +121,14 @@ static void __spin_lock_debug(spinlock_t *lock) trigger_all_cpu_backtrace(); #endif } + if (print_once++ 1000) + goto out; } + return; +out: + panic(spinlock lockup panic #%llu\n, i); + // or: + // BUG(); } void _raw_spin_lock(spinlock_t *lock)
Re: BUG/ spinlock lockup, 2.6.24
Denys Fedoryshchenko wrote, On 02/15/2008 08:42 PM: ... I have similar crashes on completely different hardware with same job (QOS), so i think it is actually some nasty bug in networking. Maybe you could try with some other debugging options? E.g. since lockdep doesn't help - turn this off. Instead try some others, like these: # CONFIG_DEBUG_SPINLOCK_SLEEP is not set # CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set # CONFIG_DEBUG_KOBJECT is not set # CONFIG_DEBUG_HIGHMEM is not set # CONFIG_DEBUG_VM is not set # CONFIG_DEBUG_LIST is not set # CONFIG_DEBUG_SG is not set # CONFIG_BOOT_PRINTK_DELAY is not set # CONFIG_DEBUG_STACKOVERFLOW is not set # CONFIG_DEBUG_STACK_USAGE is not set # CONFIG_DEBUG_RODATA is not set Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
Denys Fedoryshchenko wrote, On 02/13/2008 09:13 AM: It is very difficult to reproduce, happened after running about 1month. No changes done in classes at time of crash. Kernel 2.6.24 vanilla Hi, I could be wrong, but IMHO this looks like stack was overridden here, so my proposal is to try this: CONFIG_DEBUG_STACKOVERFLOW=y But, if you're not very interested in reproducing this, you could also try to turn off some other debugging, especially lockdep. Regards, Jarek P. ... Feb 10 15:53:22 SHAPER [ 8271.778915] BUG: NMI Watchdog detected LOCKUP Feb 10 15:53:22 SHAPER on CPU1, eip c01f0e5d, registers: ... Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[c01f0e5d] EFLAGS: 0082 CPU: 1 Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: f76494a4 EDX: c1ff5f80 Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: ESP: f7c29c70 Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 task=f7c20a60 task.ti=f7c28000) Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER c01f0ef4 Feb 10 15:53:22 SHAPER c1ff5f80 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a8 Feb 10 15:53:22 SHAPER c1ff5f78 Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779493] Feb 10 15:53:22 SHAPER [ 8271.779307] Pid: 0, comm: swapper Not tainted (2.6.24-build-0021 #26) Feb 10 15:53:22 SHAPER [ 8271.779327] EIP: 0060:[c01f0e5d] EFLAGS: 0082 CPU: 1 Feb 10 15:53:22 SHAPER [ 8271.779349] EIP is at __rb_rotate_right+0x5/0x50 Feb 10 15:53:22 SHAPER [ 8271.779366] EAX: f76494a4 EBX: f76494a4 ECX: f76494a4 EDX: c1ff5f80 Feb 10 15:53:22 SHAPER [ 8271.779386] ESI: f76494a4 EDI: c1ff5f80 EBP: ESP: f7c29c70 Feb 10 15:53:22 SHAPER [ 8271.779406] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 Feb 10 15:53:22 SHAPER [ 8271.779425] Process swapper (pid: 0, ti=f7c28000 task=f7c20a60 task.ti=f7c28000) Feb 10 15:53:22 SHAPER Feb 10 15:53:22 SHAPER [ 8271.779446] Stack: Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 Feb 10 15:53:22 SHAPER f76494a4 ... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
Hi, It seems, this nice report is still uncomplete: could you check if there could have been something more yet? Thanks, Jarek P. On Tue, Feb 12, 2008 at 10:58:21AM +, James Chapman wrote: ... Here is a trace from when we had _bh locks. Feb 5 16:26:32 == Feb 5 16:26:32 [ INFO: soft-safe - soft-unsafe lock order detected ] Feb 5 16:26:32 2.6.24-core2 #1 Feb 5 16:26:32 -- Feb 5 16:26:32 pppd/3224 [HC0[0]:SC0[2]:HE1:SE0] is trying to acquire: ... Feb 5 16:26:32 [f8d7b84d] e1000_clean+0x5d/0x290 [e1000] Feb 5 16:26:32 [c039d580] net_rx_action+0x1a0/0x2a0 Feb 5 16:26:32 [c039d43f] net_rx_action+0x5f/0x2a0 Feb 5 16:26:32 [c0131e72] __do_softirq+0x92/0x120 Feb 5 16:26:32 [c0131f78] do_softirq+0x78/0x80 Feb 5 16:26:32 [c010b15a] do_IRQ+0x4a/0xa0 Feb 5 16:26:32 [c0127af0] finish_task_switch+0x0/0xc0 Feb 5 16:26:32 [c0108dcc] common_interrupt+0x24/0x34 Feb 5 16:26:32 [c0108dd6] common_interrupt+0x2e/0x34 Feb 5 16:26:32 [c01062d6] mwait_idle_with_hints+0x46/0x60 Feb 5 16:26:32 [c0106550] mwait_idle+0x0/0x20 Feb 5 16:26:32 [c0106694] cpu_idle+0x74/0xe0 Feb 5 16:26:32 [c0536a9a] start_kernel+0x30a/0x3a0 -- James Chapman Katalix Systems Ltd http://www.katalix.com Catalysts for your Embedded Linux software development -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] rcu_assign_pointer: null check fix
On 13-02-2008 01:50, Stephen Hemminger wrote: ... --- a/include/linux/rcupdate.h2008-02-12 14:46:49.0 -0800 +++ b/include/linux/rcupdate.h2008-02-12 14:56:17.0 -0800 @@ -178,7 +178,7 @@ struct rcu_head { #define rcu_assign_pointer(p, v) \ ({ \ - if (!(__builtin_constant_p(v) v))\ + if (!__builtin_constant_p(v) || v) \ Isn't it a bit safer with this Paul's version yet?: + if (!__builtin_constant_p(v) || (v))\ Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [IPROUTE2] Update various classifiers' help output for expected CLASSID syntax
On 12-02-2008 22:51, PJ Waskiewicz wrote: This updates the help output to specify that CLASSID should be hexidecimal. ... + fprintf(stderr, \nNOTE: CLASSID is parsed as hexidecimal input.\n); s/hexidecimal/hexadecimal/g (?) Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RESEND, HTB(?) softlockup, vanilla 2.6.24
On 13-02-2008 09:13, Denys Fedoryshchenko wrote: It is very difficult to reproduce, happened after running about 1month. No changes done in classes at time of crash. Kernel 2.6.24 vanilla I will try to attach also .config Hi Denys, This report looks very interesting. I don't know how others, but I plan to study it more soon (on the weekend?), then maybe more questions. Of course some exemplary tc rules should be helpful. Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][AX25] mkiss: ax_bump() locking fix
Hi, This is unchanged patch previously sent here for testing. I think it should be applied. Thanks, Jarek P. Subject: [PATCH][AX25] mkiss: ax_bump() locking fix According to one of OOPSes reported by Jann softirq can break while skb is prepared for netif_rx. The report isn't complete, so the real reason of the later bug could be different, but IMHO this locking break in ax_bump is unsafe and unnecessary. Reported-and-tested-by: Jann Traschewski [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- drivers/net/hamradio/mkiss.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c index cfcd15a..30c9b3b 100644 --- a/drivers/net/hamradio/mkiss.c +++ b/drivers/net/hamradio/mkiss.c @@ -289,7 +289,6 @@ static void ax_bump(struct mkiss *ax) *ax-rbuff = ~0x20; } } - spin_unlock_bh(ax-buflock); count = ax-rcount; @@ -297,17 +296,17 @@ static void ax_bump(struct mkiss *ax) printk(KERN_ERR mkiss: %s: memory squeeze, dropping packet.\n, ax-dev-name); ax-stats.rx_dropped++; + spin_unlock_bh(ax-buflock); return; } - spin_lock_bh(ax-buflock); memcpy(skb_put(skb,count), ax-rbuff, count); - spin_unlock_bh(ax-buflock); skb-protocol = ax25_type_trans(skb, ax-dev); netif_rx(skb); ax-dev-last_rx = jiffies; ax-stats.rx_packets++; ax-stats.rx_bytes += count; + spin_unlock_bh(ax-buflock); } static void kiss_unesc(struct mkiss *ax, unsigned char s) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][AX25] ax25_out: check skb for NULL in ax25_kick()
Hi, Here is an official version of testing patch #2 from this thread. The only difference: ax25-vs is changed only after checking skb is not NULL (plus a comment). IMHO it could be applied. Thanks, Jarek P. Subject: [AX25] ax25_out: check skb for NULL in ax25_kick() According to some OOPS reports ax25_kick tries to clone NULL skbs sometimes. It looks like a race with ax25_clear_queues(). Probably there is no need to add more than a simple check for this yet. Another report suggested there are probably also cases where ax25 -paclen == 0 can happen in ax25_output(); this wasn't confirmed during testing but let's leave this debugging check for some time. Reported-and-tested-by: Jann Traschewski [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- diff -Nurp 2.6.24-mm1-/net/ax25/ax25_out.c 2.6.24-mm1+/net/ax25/ax25_out.c --- 2.6.24-mm1-/net/ax25/ax25_out.c 2008-01-24 22:58:37.0 + +++ 2.6.24-mm1+/net/ax25/ax25_out.c 2008-02-13 10:43:50.0 + @@ -117,6 +117,12 @@ void ax25_output(ax25_cb *ax25, int pacl unsigned char *p; int frontlen, len, fragno, ka9qfrag, first = 1; + if (paclen 16) { + WARN_ON_ONCE(1); + kfree_skb(skb); + return; + } + if ((skb-len - 1) paclen) { if (*skb-data == AX25_P_TEXT) { skb_pull(skb, 1); /* skip PID */ @@ -251,8 +257,6 @@ void ax25_kick(ax25_cb *ax25) if (start == end) return; - ax25-vs = start; - /* * Transmit data until either we're out of data to send or * the window is full. Send a poll on the final I frame if @@ -261,8 +265,13 @@ void ax25_kick(ax25_cb *ax25) /* * Dequeue the frame and copy it. +* Check for race with ax25_clear_queues(). */ skb = skb_dequeue(ax25-write_queue); + if (!skb) + return; + + ax25-vs = start; do { if ((skbn = skb_clone(skb, GFP_ATOMIC)) == NULL) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fib_trie: rcu_assign_pointer warning fix
On 12-02-2008 02:16, David Miller wrote: From: Stephen Hemminger [EMAIL PROTECTED] Date: Mon, 11 Feb 2008 16:59:54 -0800 linux-kernel added to CC:, any change to generic kernel infrastructure should be posted there Eliminate warnings when rcu_assign_pointer is used with unsigned long. It is reasonable to use RCU with non-pointer values so allow it for general use. Add a comment to explain the if test. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/linux/rcupdate.h | 13 +++-- 1 files changed, 7 insertions(+), 6 deletions(-) diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 37a642c..c44ac87 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -172,14 +172,15 @@ struct rcu_head { * structure after the pointer assignment. More importantly, this * call documents which pointers will be dereferenced by RCU read-side * code. + * + * If value is the NULL (constant 0), then no barrier is needed. */ -#define rcu_assign_pointer(p, v) \ -({ \ -if (!__builtin_constant_p(v) || \ -((v) != NULL)) \ -smp_wmb(); \ -(p) = (v); \ +#define rcu_assign_pointer(p, v)\ +({ \ +if (!(__builtin_constant_p(v) v))\ ...But, If value is the NULL (constant 0) we have: if (!(1 NULL != 0)) == if (!(0)) and the barrier is used?! +smp_wmb(); \ +(p) = (v); \ }) /** Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][AX25] ax25_route: make ax25_route_lock BH safe
On Tue, Feb 12, 2008 at 09:43:26AM +0100, Jann Traschewski wrote: Applied on 2.6.24.2 and up without any problems/warnings since 12 hours. Thanks, Jann Thanks Jann, too! BTW, I hope maybe until tomorrow I'll figure out something about those earlier two AX25 testing patches. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fib_trie: rcu_assign_pointer warning fix
On Tue, Feb 12, 2008 at 08:07:29AM -0800, Paul E. McKenney wrote: ... All programmers are blind, especially me. Hmm... I got it my way: you - superheroes - sometimes seem to be just like us - common people... (Probably early in the morning, before dressing your funny costumes?) You are right, Jarek. I ran this through gcc, and the following comes close: #define rcu_assign_pointer(p, v) \ ({ \ if (!__builtin_constant_p(v) || (v)) \ smp_wmb(); \ (p) = (v); \ }) But I am concerned about the following case: rcu_assign_pointer(global_index, 0); . . . x = global_array[rcu_dereference(global_index)]; Since arrays have a zero-th element, we would really want a memory barrier in this case. It seems the above version of this macro uses the barrier for 0, but if I miss something, or for these other: documenting reasons, then of course you are right. So how about leaving the index-unfriendly version of rcu_assign_pointer() and adding an rcu_assign_index() as follows? Thanx, Paul Signed-off-by: Paul E. McKenney [EMAIL PROTECTED] --- rcupdate.h | 18 ++ 1 file changed, 18 insertions(+) diff -urpNa -X dontdiff linux-2.6.24/include/linux/rcupdate.h linux-2.6.24-rai/include/linux/rcupdate.h --- linux-2.6.24/include/linux/rcupdate.h 2008-01-24 14:58:37.0 -0800 +++ linux-2.6.24-rai/include/linux/rcupdate.h 2008-02-12 08:04:59.0 -0800 @@ -278,6 +278,24 @@ extern struct lockdep_map rcu_lock_map; }) /** + * rcu_assign_index - assign (publicize) a index of a newly + * initialized array elementg that will be dereferenced by RCU ---^ + * initialized array element that will be dereferenced by RCU Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fib_trie: rcu_assign_pointer warning fix
On Tue, Feb 12, 2008 at 08:32:18PM +0100, Jarek Poplawski wrote: ... It seems the above version of this macro uses the barrier for 0, but if I miss something, or for these other: documenting reasons, ...or __builtin_constants could be used for indexing (?!), then of course you are right. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Tue, Feb 12, 2008 at 10:58:21AM +, James Chapman wrote: ... Here is a trace from when we had _bh locks. Very nice... ...But since it's quite long, and if you don't know all these paths this could take some time, maybe one question: so if lockdep got these locks right (sometimes it can be wrong when the same structures are nested), then it seems some problem is with this place below. This lock is taken for writing with softirqs enabled here, and IMHO it would be interesting to test if changing this is enough for lockdep. It seems this is in ip4_datagram_connect() during sk_dst_reset() or sk_dst_set(). So maybe you could try with local_bh_disable/enable() around them (or maybe some better idea)? Anyway, I'll try to learn this more in the meantime. Jarek P. Feb 5 16:26:32 to a soft-irq-unsafe lock: Feb 5 16:26:32 (sk-sk_dst_lock){} Feb 5 16:26:32 ... which became soft-irq-unsafe at: Feb 5 16:26:32 ... [c014e02e] mark_held_locks+0x5e/0x80 Feb 5 16:26:32 [c014ed92] __lock_acquire+0x6a2/0x10a0 Feb 5 16:26:32 [c010f5b0] save_stack_trace+0x20/0x40 Feb 5 16:26:32 [c014c524] add_lock_to_list+0x44/0xb0 Feb 5 16:26:32 [c03dea29] __udp_lib_get_port+0x19/0x200 Feb 5 16:26:32 [c014f735] __lock_acquire+0x1045/0x10a0 Feb 5 16:26:32 [c014f804] lock_acquire+0x74/0xa0 Feb 5 16:26:32 [c03db0a3] ip4_datagram_connect+0x53/0x380 Feb 5 16:26:32 [c040418a] _write_lock+0x2a/0x40 Feb 5 16:26:32 [c03db0a3] ip4_datagram_connect+0x53/0x380 Feb 5 16:26:32 [c03db0a3] ip4_datagram_connect+0x53/0x380 Feb 5 16:26:32 [c014e1c5] trace_hardirqs_on+0xc5/0x170 Feb 5 16:26:32 [c0132317] local_bh_enable_ip+0xa7/0x120 Feb 5 16:26:32 [c014e1c5] trace_hardirqs_on+0xc5/0x170 Feb 5 16:26:32 [c040414f] _spin_lock_bh+0x2f/0x40 Feb 5 16:26:32 [c03e4d55] inet_dgram_connect+0x35/0x80 Feb 5 16:26:32 [c038ec52] sys_connect+0x82/0xd0 Feb 5 16:26:32 [c01455df] down_read_trylock+0x4f/0x60 Feb 5 16:26:32 [c011fe9c] do_page_fault+0xfc/0x940 Feb 5 16:26:32 [c0404024] _spin_unlock+0x14/0x20 Feb 5 16:26:32 [c03905f8] sys_socketcall+0x98/0x280 Feb 5 16:26:32 [c014e1c5] trace_hardirqs_on+0xc5/0x170 Feb 5 16:26:32 [c02a86ba] copy_to_user+0x3a/0x70 Feb 5 16:26:32 [c0108417] restore_nocheck+0x12/0x15 Feb 5 16:26:32 [c01083aa] syscall_call+0x7/0xb Feb 5 16:26:32 [] 0x -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 11, 2008 at 11:42:12PM +, James Chapman wrote: Jarek Poplawski wrote: On Mon, Feb 11, 2008 at 11:49:24PM +0100, Jarek Poplawski wrote: On Mon, Feb 11, 2008 at 10:19:35PM +, James Chapman wrote: ... Below is example output from lockdep. The oops is reproducible when creating/deleting lots of sessions while passing data. The lock is being acquired for read and write in softirq contexts. ...Hmmm... And according to this, changing read_locks should be necessary too. The patch changes both read and write locks. Right! This was only errata to my earlier comment where I considered only lockdep info and forgot about yours... Sorry, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Tue, Feb 12, 2008 at 10:00:03PM -0800, David Miller wrote: From: James Chapman [EMAIL PROTECTED] Date: Tue, 12 Feb 2008 10:58:21 + Here is a trace from when we had _bh locks. The problem is that the pppol2tp code calls sk_dst_get() in software interrupt context and that is not allowed. Actually, this lockdep report probably warns of something else: sk_dst_lock which is seen in process context with softirqs enabled implicitly endangers some other (ppp_generic) locks, which depend on ppp_generic pch-upl read_lock, which is taken in softirq context. I can't see on this report any hardirqs dependancies, so it seems even blocking hard interrupts, just like in this patch, shouldn't solve this problem: if BHs were really disabled in exactly the same places, then it seems this warning should trigger in both variants. I studied long ago some similar problem with pppoe, and it looked like ppp_generic needed some fix, but now I've to recall or re-learn almost all and it needs more time... Anyway, there is a possibility, this warning isn't so dangerous. sk_dst_get() grabs sk-sk_dst_lock without any BH enabling or disabling. It can do that because the usage is to make all the lock taking calls in user context, and in the packet processing paths use __sk_dst_get(). BTW, does it mean that ip4_datagram_connect() can be used only in user context? Probably what the pppol2tp code should do is use __sk_dst_check() instead of sk_dst_get(). You then have to be able to handle NULL returns, just like UDP sendmsg() does, which means you'll need to cook up a routing lookup if __sk_dst_check() gives you NULL because the route became obsolete. I think that without changing ppp_generic or changing ip_queue_xmit() to something else in pppol2tp this would be really hard to avoid such a warning (and maybe not very necessary). Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 11, 2008 at 11:49:24PM +0100, Jarek Poplawski wrote: On Mon, Feb 11, 2008 at 10:19:35PM +, James Chapman wrote: ... Below is example output from lockdep. The oops is reproducible when creating/deleting lots of sessions while passing data. The lock is being acquired for read and write in softirq contexts. ...Hmmm... And according to this, changing read_locks should be necessary too. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 11, 2008 at 10:19:35PM +, James Chapman wrote: ... Below is example output from lockdep. The oops is reproducible when creating/deleting lots of sessions while passing data. The lock is being acquired for read and write in softirq contexts. Is there a better way to fix this? = [ INFO: inconsistent lock state ] 2.6.24-core2 #1 - inconsistent {in-softirq-R} - {softirq-on-W} usage. openl2tpd/3215 [HC0[0]:SC0[0]:HE1:SE1] takes: (tunnel-hlist_lock){---?}, at: [f8eea157] pppol2tp_connect+0x517/0x6d0 [pppol2tp] {in-softirq-R} state was registered at: IMHO, according to this, disabling bh should be enough. And if it's like in this report: only read_lock is taken from softirqs, then this should be necessary to change only all write_locks to write_lock_bh (of course unless somewhere bhs are disabled already). Unless I miss something?! Cheers, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
James Chapman wrote, On 02/11/2008 10:22 AM: Fix locking issues in the pppol2tp driver which can cause a kernel crash on SMP boxes when hundreds of L2TP sessions are created/deleted simultaneously (ISP environment). The driver was violating read_lock() and write_lock() scheduling rules so we now consistently use the _irq variants of the lock functions. ... Hi, Could you explain what exactly scheduling rules do you mean here, and why disabling interrupts is the best solution for this? Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Netfilter fixes to 2.6.24-git
On 11-02-2008 08:00, Patrick McHardy wrote: ... Indeed, I'm currently ill and not really up for much working, but this will hopefully get better soon. Wish you well, Patrick! I hope it's about some WARN_ON not BUG_ON? Anyway, drop these last patches and get back to stable! Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][PPPOL2TP]: Fix SMP oops in pppol2tp driver
On Mon, Feb 11, 2008 at 11:41:18PM +, James Chapman wrote: Jarek Poplawski wrote: On Mon, Feb 11, 2008 at 10:19:35PM +, James Chapman wrote: ... Below is example output from lockdep. The oops is reproducible when creating/deleting lots of sessions while passing data. The lock is being acquired for read and write in softirq contexts. Is there a better way to fix this? = [ INFO: inconsistent lock state ] 2.6.24-core2 #1 - inconsistent {in-softirq-R} - {softirq-on-W} usage. openl2tpd/3215 [HC0[0]:SC0[0]:HE1:SE1] takes: (tunnel-hlist_lock){---?}, at: [f8eea157] pppol2tp_connect+0x517/0x6d0 [pppol2tp] {in-softirq-R} state was registered at: IMHO, according to this, disabling bh should be enough. And if it's like in this report: only read_lock is taken from softirqs, then this should be necessary to change only all write_locks to write_lock_bh (of course unless somewhere bhs are disabled already). Unless I miss something?! I thought so too. I tried _bh locks first and the problem still occurred. Maybe I'll try it again in case I messed something up. If there are any new lockdep warnings I'd be interested to have a look. (BTW, with irqs disabling such ISP would probably get considerable drop in performance?) Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][AX25] ax25_route: make ax25_route_lock BH safe
Subject: [AX25] ax25_route: make ax25_route_lock BH safe = [ INFO: inconsistent lock state ] 2.6.24-dg8ngn-p02 #1 - inconsistent {softirq-on-W} - {in-softirq-R} usage. linuxnet/3046 [HC0[0]:SC1[2]:HE1:SE0] takes: (ax25_route_lock){--.+}, at: [f8a0cfb7] ax25_get_route+0x18/0xb7 [ax25] {softirq-on-W} state was registered at: ... This lockdep report shows that ax25_route_lock is taken for reading in softirq context, and for writing in process context with BHs enabled. So, to make this safe, all write_locks in ax25_route.c are changed to _bh versions. Reported-by: Jann Traschewski [EMAIL PROTECTED], Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- diff -Nurp 2.6.24-mm1-/net/ax25/ax25_route.c 2.6.24-mm1+/net/ax25/ax25_route.c --- 2.6.24-mm1-/net/ax25/ax25_route.c 2008-02-05 07:45:38.0 + +++ 2.6.24-mm1+/net/ax25/ax25_route.c 2008-02-11 11:58:47.0 + @@ -45,7 +45,7 @@ void ax25_rt_device_down(struct net_devi { ax25_route *s, *t, *ax25_rt; - write_lock(ax25_route_lock); + write_lock_bh(ax25_route_lock); ax25_rt = ax25_route_list; while (ax25_rt != NULL) { s = ax25_rt; @@ -68,7 +68,7 @@ void ax25_rt_device_down(struct net_devi } } } - write_unlock(ax25_route_lock); + write_unlock_bh(ax25_route_lock); } static int __must_check ax25_rt_add(struct ax25_routes_struct *route) @@ -82,7 +82,7 @@ static int __must_check ax25_rt_add(stru if (route-digi_count AX25_MAX_DIGIS) return -EINVAL; - write_lock(ax25_route_lock); + write_lock_bh(ax25_route_lock); ax25_rt = ax25_route_list; while (ax25_rt != NULL) { @@ -92,7 +92,7 @@ static int __must_check ax25_rt_add(stru ax25_rt-digipeat = NULL; if (route-digi_count != 0) { if ((ax25_rt-digipeat = kmalloc(sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { - write_unlock(ax25_route_lock); + write_unlock_bh(ax25_route_lock); return -ENOMEM; } ax25_rt-digipeat-lastrepeat = -1; @@ -102,14 +102,14 @@ static int __must_check ax25_rt_add(stru ax25_rt-digipeat-calls[i]= route-digi_addr[i]; } } - write_unlock(ax25_route_lock); + write_unlock_bh(ax25_route_lock); return 0; } ax25_rt = ax25_rt-next; } if ((ax25_rt = kmalloc(sizeof(ax25_route), GFP_ATOMIC)) == NULL) { - write_unlock(ax25_route_lock); + write_unlock_bh(ax25_route_lock); return -ENOMEM; } @@ -120,7 +120,7 @@ static int __must_check ax25_rt_add(stru ax25_rt-ip_mode = ' '; if (route-digi_count != 0) { if ((ax25_rt-digipeat = kmalloc(sizeof(ax25_digi), GFP_ATOMIC)) == NULL) { - write_unlock(ax25_route_lock); + write_unlock_bh(ax25_route_lock); kfree(ax25_rt); return -ENOMEM; } @@ -133,7 +133,7 @@ static int __must_check ax25_rt_add(stru } ax25_rt-next = ax25_route_list; ax25_route_list = ax25_rt; - write_unlock(ax25_route_lock); + write_unlock_bh(ax25_route_lock); return 0; } @@ -152,7 +152,7 @@ static int ax25_rt_del(struct ax25_route if ((ax25_dev = ax25_addr_ax25dev(route-port_addr)) == NULL) return -EINVAL; - write_lock(ax25_route_lock); + write_lock_bh(ax25_route_lock); ax25_rt = ax25_route_list; while (ax25_rt != NULL) { @@ -174,7 +174,7 @@ static int ax25_rt_del(struct ax25_route } } } - write_unlock(ax25_route_lock); + write_unlock_bh(ax25_route_lock); return 0; } @@ -188,7 +188,7 @@ static int ax25_rt_opt(struct ax25_route if ((ax25_dev = ax25_addr_ax25dev(rt_option-port_addr)) == NULL) return -EINVAL; - write_lock(ax25_route_lock); + write_lock_bh(ax25_route_lock); ax25_rt = ax25_route_list; while (ax25_rt != NULL) { @@ -216,7 +216,7 @@ static int ax25_rt_opt(struct ax25_route } out: - write_unlock(ax25_route_lock); + write_unlock_bh(ax25_route_lock); return err; } @@ -492,7 +492,7 @@ void __exit ax25_rt_free(void) { ax25_route *s, *ax25_rt = ax25_route_list; - write_lock(ax25_route_lock); + write_lock_bh(ax25_route_lock); while (ax25_rt != NULL) { s = ax25_rt
[PATCH v2][AX25] af_ax25: remove sock lock in ax25_info_show()
Hi, Here is a little bit better version, I hope. Regards, Jarek P. -- (take 2) Subject: [AX25] af_ax25: remove sock lock in ax25_info_show() This lockdep warning: === [ INFO: possible circular locking dependency detected ] 2.6.24 #3 --- swapper/0 is trying to acquire lock: (ax25_list_lock){-+..}, at: [f91dd3b1] ax25_destroy_socket+0x171/0x1f0 [ax25] but task is already holding lock: (slock-AF_AX25){-+..}, at: [f91dbabc] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25] which lock already depends on the new lock. ... shows that ax25_list_lock and slock-AF_AX25 are taken in different order: ax25_info_show() takes slock (bh_lock_sock(ax25-sk)) while ax25_list_lock is held, so reversely to other functions. To fix this the sock lock should be moved to ax25_info_start(), and there would be still problem with breaking ax25_list_lock (it seems this proper order isn't optimal yet). But, since it's only for reading proc info it seems this is not necessary (e.g. ax25_send_to_raw() does similar reading without this lock too). So, this patch removes sock lock to avoid deadlock possibility; there is also used sock_i_ino() function, which reads sk_socket under proper read lock. Additionally printf format of this i_ino is changed to %lu. Reported-by: Bernard Pidoux F6BVP [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- net/ax25/af_ax25.c |6 ++ 1 files changed, 2 insertions(+), 4 deletions(-) diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 94b2b1b..48bfcc7 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1924,12 +1924,10 @@ static int ax25_info_show(struct seq_file *seq, void *v) ax25-paclen); if (ax25-sk != NULL) { - bh_lock_sock(ax25-sk); - seq_printf(seq, %d %d %ld\n, + seq_printf(seq, %d %d %lu\n, atomic_read(ax25-sk-sk_wmem_alloc), atomic_read(ax25-sk-sk_rmem_alloc), - ax25-sk-sk_socket != NULL ? SOCK_INODE(ax25-sk-sk_socket)-i_ino : 0L); - bh_unlock_sock(ax25-sk); + sock_i_ino(ax25-sk)); } else { seq_puts(seq, * * *\n); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer
On Sat, Feb 09, 2008 at 07:44:50PM +0100, Bernard Pidoux F6BVP wrote: Hi, With AX25 patches applied I still get this possible circular locking message. Hi Bernard, Could you confirm which exactly patches did you try? Is this vanilla 2.6.24 plus these two: ax25_timer and ax25_ds_timer or something more? And I'm not sure what do you mean by still: the warning came back just after these last patches? At least these timer patches don't seem to change anything around locking? Thanks for testing this, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][AX25] af_ax25: remove sock lock in ax25_info_show()
On Sat, Feb 09, 2008 at 07:44:50PM +0100, Bernard Pidoux F6BVP wrote: Hi, With AX25 patches applied I still get this possible circular locking message. IMHO this warning could happen earlier too... Thanks, Jarek P. -- Subject: [AX25] af_ax25: remove sock lock in ax25_info_show() This lockdep warning: === [ INFO: possible circular locking dependency detected ] 2.6.24 #3 --- swapper/0 is trying to acquire lock: (ax25_list_lock){-+..}, at: [f91dd3b1] ax25_destroy_socket+0x171/0x1f0 [ax25] but task is already holding lock: (slock-AF_AX25){-+..}, at: [f91dbabc] ax25_std_heartbeat_expiry+0x1c/0xe0 [ax25] which lock already depends on the new lock. ... shows that ax25_list_lock and slock-AF_AX25 are taken in different order: ax25_info_show() takes slock (bh_lock_sock(ax25-sk)) while ax25_list_lock is held, so reversely to other functions. To fix this the sock lock should be moved to ax25_info_start(), but since it's only for reading proc info it seems this is not necessary (e.g. ax25_send_to_raw() does similar reading without this lock too). So, this patch removes this lock to avoid deadlock possibility. Reported-by: Bernard Pidoux F6BVP [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- net/ax25/af_ax25.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 94b2b1b..68b5171 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1924,12 +1924,10 @@ static int ax25_info_show(struct seq_file *seq, void *v) ax25-paclen); if (ax25-sk != NULL) { - bh_lock_sock(ax25-sk); seq_printf(seq, %d %d %ld\n, atomic_read(ax25-sk-sk_wmem_alloc), atomic_read(ax25-sk-sk_rmem_alloc), ax25-sk-sk_socket != NULL ? SOCK_INODE(ax25-sk-sk_socket)-i_ino : 0L); - bh_unlock_sock(ax25-sk); } else { seq_puts(seq, * * *\n); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][NET_SCHED] sch_htb: htb_requeue fix
htb_requeue() enqueues skbs for which htb_classify() returns NULL. This is wrong because such skbs could be handled by NET_CLS_ACT code, and the decision could be different than earlier in htb_enqueue(). So htb_requeue() is changed to work and look more like htb_enqueue(). Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- diff -Nurp 2.6.24-mm1-/net/sched/sch_htb.c 2.6.24-mm1+/net/sched/sch_htb.c --- 2.6.24-mm1-/net/sched/sch_htb.c 2008-02-05 07:45:48.0 + +++ 2.6.24-mm1+/net/sched/sch_htb.c 2008-02-08 08:19:25.0 + @@ -609,14 +609,14 @@ static int htb_enqueue(struct sk_buff *s /* TODO: requeuing packet charges it to policers again !! */ static int htb_requeue(struct sk_buff *skb, struct Qdisc *sch) { + int ret; struct htb_sched *q = qdisc_priv(sch); - int ret = NET_XMIT_SUCCESS; struct htb_class *cl = htb_classify(skb, sch, ret); struct sk_buff *tskb; - if (cl == HTB_DIRECT || !cl) { + if (cl == HTB_DIRECT) { /* enqueue to helper queue */ - if (q-direct_queue.qlen q-direct_qlen cl) { + if (q-direct_queue.qlen q-direct_qlen) { __skb_queue_head(q-direct_queue, skb); } else { __skb_queue_head(q-direct_queue, skb); @@ -625,6 +625,13 @@ static int htb_requeue(struct sk_buff *s sch-qstats.drops++; return NET_XMIT_CN; } +#ifdef CONFIG_NET_CLS_ACT + } else if (!cl) { + if (ret == NET_XMIT_BYPASS) + sch-qstats.drops++; + kfree_skb(skb); + return ret; +#endif } else if (cl-un.leaf.q-ops-requeue(skb, cl-un.leaf.q) != NET_XMIT_SUCCESS) { sch-qstats.drops++; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG][AX25] Fwd: SMP with AX.25
On Wed, Feb 06, 2008 at 07:45:29AM +, Jarek Poplawski wrote: ... From: Jann Traschewski [EMAIL PROTECTED] Subject: SMP with AX.25 (testing patch #2) According to some OOPSes reported by Jann it seems ax25_kick tries to clone NULL skbs sometimes. Probably there is needed a sock lock, but until it's checked more this patch should help to avoid this. There are probably also cases where ax25 -paclen == 0 can happen: let's check for this. I attach this patch for testing purpose only. Jarek P. --- diff -Nurp 2.6.24-/net/ax25/ax25_out.c 2.6.24+/net/ax25/ax25_out.c --- 2.6.24-/net/ax25/ax25_out.c 2008-01-24 22:58:37.0 + +++ 2.6.24+/net/ax25/ax25_out.c 2008-02-07 11:48:39.0 + @@ -117,6 +117,12 @@ void ax25_output(ax25_cb *ax25, int pacl unsigned char *p; int frontlen, len, fragno, ka9qfrag, first = 1; + if (paclen 16) { + WARN_ON_ONCE(1); + kfree_skb(skb); + return; + } + if ((skb-len - 1) paclen) { if (*skb-data == AX25_P_TEXT) { skb_pull(skb, 1); /* skip PID */ @@ -263,6 +269,8 @@ void ax25_kick(ax25_cb *ax25) * Dequeue the frame and copy it. */ skb = skb_dequeue(ax25-write_queue); + if (!skb) + return; do { if ((skbn = skb_clone(skb, GFP_ATOMIC)) == NULL) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG][AX25] Fwd: SMP with AX.25
On Thu, Feb 07, 2008 at 08:35:11PM +0100, Jarek Poplawski wrote: Here I resend another OOPS I got from Jann: On Thu, Feb 07, 2008 at 05:42:42PM +0100, Jann Traschewski wrote: ... BUG: unable to handle kernel NULL pointer dereference at virtual address 0065 printing eip: c02266c7 *pde = Oops: [#1] SMP Modules linked in: netconsole ppp_deflate zlib_deflate zlib_inflate bsd_comp ppp_async ppp_generic slhc tun bitrev crc32 mkiss ax25 crc16 iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT iptable_filter iptable_mangle xt_MARK ipv6 ipip tunnel4 ide_cd cdrom aic7xxx scsi_transport_spi parport_serial parport_pc parport i2c_piix4 genrtc Pid: 3035, comm: linuxnet Not tainted (2.6.24-dg8ngn #1) EIP: 0060:[c02266c7] EFLAGS: 00010202 CPU: 0 EIP is at skb_clone+0x3/0x4d EAX: EBX: ECX: 0001 EDX: 0020 ESI: 0008 EDI: EBP: f700a4ac ESP: f720b9ac DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process linuxnet (pid: 3035, ti=f720a000 task=f7c63570 task.ti=f720a000) Stack: f700a400 f8a0a8f9 c01254db f700a400 f700a4cc f700a400 __wake_up_common+0x32/0x5c __wake_up+0x32/0x42 Jann, this report is a bit damaged here, so I could be wrong, but it looks similar to your first reports, and I guess this skb_clone is called from ax25_kick too. Then my today testing patch #2 should help for this, I hope. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG][AX25] Fwd: SMP with AX.25
Here I resend another OOPS I got from Jann: On Thu, Feb 07, 2008 at 05:42:42PM +0100, Jann Traschewski wrote: ... BUG: unable to handle kernel NULL pointer dereference at virtual address 0065 printing eip: c02266c7 *pde = Oops: [#1] SMP Modules linked in: netconsole ppp_deflate zlib_deflate zlib_inflate bsd_comp ppp_async ppp_generic slhc tun bitrev crc32 mkiss ax25 crc16 iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT iptable_filter iptable_mangle xt_MARK ipv6 ipip tunnel4 ide_cd cdrom aic7xxx scsi_transport_spi parport_serial parport_pc parport i2c_piix4 genrtc Pid: 3035, comm: linuxnet Not tainted (2.6.24-dg8ngn #1) EIP: 0060:[c02266c7] EFLAGS: 00010202 CPU: 0 EIP is at skb_clone+0x3/0x4d EAX: EBX: ECX: 0001 EDX: 0020 ESI: 0008 EDI: EBP: f700a4ac ESP: f720b9ac DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process linuxnet (pid: 3035, ti=f720a000 task=f7c63570 task.ti=f720a000) Stack: f700a400 f8a0a8f9 c01254db f700a400 f700a4cc f700a400 __wake_up_common+0x32/0x5c __wake_up+0x32/0x42 [c0224e67] sock_def_readable+0x39/0x63 [c02251e1] sock_queue_rcv_skb+0xb6/0xd1 [c022b060] netif_receive_skb+0x309/0x372 [c0117f35] dequeue_entity+0xb/0x2a [c022d42f] process_backlog+0x5c/0xaa [c022cf21] net_rx_action+0x8d/0x173 [c012213a] __do_softirq+0x5d/0xc1 [c01221d0] do_softirq+0x32/0x36 [c01223df] local_bh_enable_ip+0x35/0x40 [c0261ebe] udp_poll+0xc1/0xd5 [c0220f2c] sock_poll+0xc/0xe [c016495a] do_select+0x229/0x3c9 [c01650b6] __pollwait+0x0/0xac [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c0118e73] default_wake_function+0x0/0x8 [c011756c] __wake_up_common+0x32/0x5c [c011911c] __wake_up+0x32/0x42 [c0164da0] core_sys_select+0x2a6/0x2c7 [c011756c] __wake_up_common+0x32/0x5c [c011911c] __wake_up+0x32/0x42 [c01d13c2] tty_wakeup+0x4c/0x50 [c01d72c6] pty_unthrottle+0x12/0x1d [f89fc60e] mkiss_receive_buf+0x381/0x3af [mkiss] sys_select+0xa4/0x187 syscall_call+0x7/0xb Kernel panic - not syncing: Fatal exception in interrupt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG][AX25] Fwd: SMP with AX.25
-- Forwarded message -- From: Jann Traschewski [EMAIL PROTECTED] Date: Thu, 7 Feb 2008 06:27:38 +0100 Subject: FW: smp kernel oops ax25 2008-02-07 To: Jarek Poplawski [EMAIL PROTECTED] ... BUG: unable to handle kernel NULL pointer dereference at virtual address 0065 printing eip: c02266c7 *pde = Oops: [#1] SMP Modules linked in: netconsole ppp_deflate zlib_deflate zlib_inflate bsd_comp ppp_async ppp_generic slhc tun bitrev crc32 mkiss ax25 crc16 iptable_nat nf_nat nf_conntrack_ipv4 xt_state nf_conntrack ipt_REJECT iptable_filter iptable_mangle xt_MARK ipv6 ipip tunnel4 ide_cd cdrom aic7xxx scsi_transport_spi parport_serial parport_pc parport i2c_piix4 genrtc Pid: 19761, comm: frmaster Not tainted (2.6.24-dg8ngn #1) EIP: 0060:[c02266c7] EFLAGS: 00010297 CPU: 1 EIP is at skb_clone+0x3/0x4d EAX: EBX: ECX: EDX: 0020 ESI: 0008 EDI: EBP: c0cfc8ac ESP: d3cfddc4 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 Process frmaster (pid: 19761, ti=d3cfc000 task=f7fbf570 task.ti=d3cfc000) Stack: c0cfc800 f8a0a8f9 c0cfc800 d3cfdf6c 0007 0015 c0cfc8cc e84492e0 c0cfc800 d3cfdf64 f6e6fbc0 f8a0e9d8 d3cfde60 c1d8c340 d3cfdea0 c0cfc800 f5d12200 d3cfdf24 c014143d 01e7a9be d3cfdf24 0022 Call Trace: [f8a0a8f9] ax25_kick+0xaf/0x184 [ax25] [f8a0e9d8] ax25_sendmsg+0x35f/0x49a [ax25] [c014143d] __generic_file_aio_write_nolock+0x474/0x4d3 [c0221214] sock_aio_write+0xbc/0xc8 [c01414f7] generic_file_aio_write+0x5b/0xb0 [c015a1d3] do_sync_write+0xc7/0x10a [c022b643] dev_hard_start_xmit+0x20a/0x26a [c012db6d] autoremove_wake_function+0x0/0x35 [c01253d3] lock_timer_base+0x19/0x35 [c015a960] vfs_write+0x9e/0x10c [c015aeb7] sys_write+0x41/0x67 [c0103e4e] syscall_call+0x7/0xb === Code: 24 20 89 7c 24 08 89 44 24 04 03 aa 9c 00 00 00 89 2c 24 e8 cc 67 f9 ff 89 44 24 54 8b 44 24 54 83 c4 3c 5b 5e 5f 5d c3 53 89 c3 8a 40 65 24 18 3c 08 75 1f 8d 8b a8 00 00 00 f6 41 65 18 75 13 EIP: [c02266c7] skb_clone+0x3/0x4d SS:ESP 0068:d3cfddc4 ---[ end trace 504af05b5c529aa1 ]--- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][AX25] Fwd: SMP with AX.25
On Wed, Feb 06, 2008 at 07:45:29AM +, Jarek Poplawski wrote: ... From: Jann Traschewski [EMAIL PROTECTED] Subject: SMP with AX.25 [AX25] ax25_timer: use mod_timer instead of add_timer According to one of Jann's OOPS reports it looks like BUG_ON(timer_pending(timer)) triggers during add_timer() in ax25_start_t1timer(). This patch changes current use of: init_timer(), add_timer() and del_timer() to setup_timer() with mod_timer(), which should be safer anyway. Reported-by: Jann Traschewski [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- include/net/ax25.h|1 + net/ax25/af_ax25.c|6 + net/ax25/ax25_timer.c | 60 +--- 3 files changed, 23 insertions(+), 44 deletions(-) diff --git a/include/net/ax25.h b/include/net/ax25.h index 32a57e1..3f0236f 100644 --- a/include/net/ax25.h +++ b/include/net/ax25.h @@ -416,6 +416,7 @@ extern void ax25_calculate_rtt(ax25_cb *); extern void ax25_disconnect(ax25_cb *, int); /* ax25_timer.c */ +extern void ax25_setup_timers(ax25_cb *); extern void ax25_start_heartbeat(ax25_cb *); extern void ax25_start_t1timer(ax25_cb *); extern void ax25_start_t2timer(ax25_cb *); diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 8fc64e3..94b2b1b 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -510,11 +510,7 @@ ax25_cb *ax25_create_cb(void) skb_queue_head_init(ax25-ack_queue); skb_queue_head_init(ax25-reseq_queue); - init_timer(ax25-timer); - init_timer(ax25-t1timer); - init_timer(ax25-t2timer); - init_timer(ax25-t3timer); - init_timer(ax25-idletimer); + ax25_setup_timers(ax25); ax25_fillin_cb(ax25, NULL); diff --git a/net/ax25/ax25_timer.c b/net/ax25/ax25_timer.c index 7259486..db29ea7 100644 --- a/net/ax25/ax25_timer.c +++ b/net/ax25/ax25_timer.c @@ -40,63 +40,45 @@ static void ax25_t2timer_expiry(unsigned long); static void ax25_t3timer_expiry(unsigned long); static void ax25_idletimer_expiry(unsigned long); -void ax25_start_heartbeat(ax25_cb *ax25) +void ax25_setup_timers(ax25_cb *ax25) { - del_timer(ax25-timer); - - ax25-timer.data = (unsigned long)ax25; - ax25-timer.function = ax25_heartbeat_expiry; - ax25-timer.expires = jiffies + 5 * HZ; + setup_timer(ax25-timer, ax25_heartbeat_expiry, (unsigned long)ax25); + setup_timer(ax25-t1timer, ax25_t1timer_expiry, (unsigned long)ax25); + setup_timer(ax25-t2timer, ax25_t2timer_expiry, (unsigned long)ax25); + setup_timer(ax25-t3timer, ax25_t3timer_expiry, (unsigned long)ax25); + setup_timer(ax25-idletimer, ax25_idletimer_expiry, + (unsigned long)ax25); +} - add_timer(ax25-timer); +void ax25_start_heartbeat(ax25_cb *ax25) +{ + mod_timer(ax25-timer, jiffies + 5 * HZ); } void ax25_start_t1timer(ax25_cb *ax25) { - del_timer(ax25-t1timer); - - ax25-t1timer.data = (unsigned long)ax25; - ax25-t1timer.function = ax25_t1timer_expiry; - ax25-t1timer.expires = jiffies + ax25-t1; - - add_timer(ax25-t1timer); + mod_timer(ax25-t1timer, jiffies + ax25-t1); } void ax25_start_t2timer(ax25_cb *ax25) { - del_timer(ax25-t2timer); - - ax25-t2timer.data = (unsigned long)ax25; - ax25-t2timer.function = ax25_t2timer_expiry; - ax25-t2timer.expires = jiffies + ax25-t2; - - add_timer(ax25-t2timer); + mod_timer(ax25-t2timer, jiffies + ax25-t2); } void ax25_start_t3timer(ax25_cb *ax25) { - del_timer(ax25-t3timer); - - if (ax25-t3 0) { - ax25-t3timer.data = (unsigned long)ax25; - ax25-t3timer.function = ax25_t3timer_expiry; - ax25-t3timer.expires = jiffies + ax25-t3; - - add_timer(ax25-t3timer); - } + if (ax25-t3 0) + mod_timer(ax25-t3timer, jiffies + ax25-t3); + else + del_timer(ax25-t3timer); } void ax25_start_idletimer(ax25_cb *ax25) { - del_timer(ax25-idletimer); - - if (ax25-idle 0) { - ax25-idletimer.data = (unsigned long)ax25; - ax25-idletimer.function = ax25_idletimer_expiry; - ax25-idletimer.expires = jiffies + ax25-idle; - - add_timer(ax25-idletimer); - } + if (ax25-idle 0) + mod_timer(ax25-idletimer, jiffies + ax25-idle); + else + del_timer(ax25-idletimer); } void ax25_stop_heartbeat(ax25_cb *ax25) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][AX25] ax25_ds_timer: use mod_timer instead of add_timer
On Wed, Feb 06, 2008 at 08:15:09AM +, Jarek Poplawski wrote: On Wed, Feb 06, 2008 at 07:45:29AM +, Jarek Poplawski wrote: ... From: Jann Traschewski [EMAIL PROTECTED] Subject: SMP with AX.25 ... [AX25] ax25_ds_timer: use mod_timer instead of add_timer This patch changes current use of: init_timer(), add_timer() and del_timer() to setup_timer() with mod_timer(), which should be safer anyway. Reported-by: Jann Traschewski [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- include/net/ax25.h |1 + net/ax25/ax25_dev.c |2 +- net/ax25/ax25_ds_timer.c | 12 3 files changed, 6 insertions(+), 9 deletions(-) diff --git a/include/net/ax25.h b/include/net/ax25.h index 3f0236f..717e219 100644 --- a/include/net/ax25.h +++ b/include/net/ax25.h @@ -324,6 +324,7 @@ extern void ax25_dama_on(ax25_cb *); extern void ax25_dama_off(ax25_cb *); /* ax25_ds_timer.c */ +extern void ax25_ds_setup_timer(ax25_dev *); extern void ax25_ds_set_timer(ax25_dev *); extern void ax25_ds_del_timer(ax25_dev *); extern void ax25_ds_timer(ax25_cb *); diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c index 528c874..a7a0e0c 100644 --- a/net/ax25/ax25_dev.c +++ b/net/ax25/ax25_dev.c @@ -82,7 +82,7 @@ void ax25_dev_device_up(struct net_device *dev) ax25_dev-values[AX25_VALUES_DS_TIMEOUT]= AX25_DEF_DS_TIMEOUT; #if defined(CONFIG_AX25_DAMA_SLAVE) || defined(CONFIG_AX25_DAMA_MASTER) - init_timer(ax25_dev-dama.slave_timer); + ax25_ds_setup_timer(ax25_dev); #endif spin_lock_bh(ax25_dev_lock); diff --git a/net/ax25/ax25_ds_timer.c b/net/ax25/ax25_ds_timer.c index c4e3b02..2ce79df 100644 --- a/net/ax25/ax25_ds_timer.c +++ b/net/ax25/ax25_ds_timer.c @@ -40,13 +40,10 @@ static void ax25_ds_timeout(unsigned long); * 1/10th of a second. */ -static void ax25_ds_add_timer(ax25_dev *ax25_dev) +void ax25_ds_setup_timer(ax25_dev *ax25_dev) { - struct timer_list *t = ax25_dev-dama.slave_timer; - t-data = (unsigned long) ax25_dev; - t-function = ax25_ds_timeout; - t-expires = jiffies + HZ; - add_timer(t); + setup_timer(ax25_dev-dama.slave_timer, ax25_ds_timeout, + (unsigned long)ax25_dev); } void ax25_ds_del_timer(ax25_dev *ax25_dev) @@ -60,10 +57,9 @@ void ax25_ds_set_timer(ax25_dev *ax25_dev) if (ax25_dev == NULL) /* paranoia */ return; - del_timer(ax25_dev-dama.slave_timer); ax25_dev-dama.slave_timeout = msecs_to_jiffies(ax25_dev-values[AX25_VALUES_DS_TIMEOUT]) / 10; - ax25_ds_add_timer(ax25_dev); + mod_timer(ax25_dev-dama.slave_timer, jiffies + HZ); } /* -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG][AX25] Fwd: SMP with AX.25
On Wed, Feb 06, 2008 at 07:45:29AM +, Jarek Poplawski wrote: ... From: Jann Traschewski [EMAIL PROTECTED] Subject: SMP with AX.25 To: [EMAIL PROTECTED] According to one of OOPSes reported by Jann softirq can break while skb is prepared for netif_rx. The report isn't complete, so the real reason of the later bug could be different, but IMHO this locking in ax_bump is wrong. I attach this patch for testing purpose only. Jarek P. --- drivers/net/hamradio/mkiss.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/net/hamradio/mkiss.c b/drivers/net/hamradio/mkiss.c index cfcd15a..30c9b3b 100644 --- a/drivers/net/hamradio/mkiss.c +++ b/drivers/net/hamradio/mkiss.c @@ -289,7 +289,6 @@ static void ax_bump(struct mkiss *ax) *ax-rbuff = ~0x20; } } - spin_unlock_bh(ax-buflock); count = ax-rcount; @@ -297,17 +296,17 @@ static void ax_bump(struct mkiss *ax) printk(KERN_ERR mkiss: %s: memory squeeze, dropping packet.\n, ax-dev-name); ax-stats.rx_dropped++; + spin_unlock_bh(ax-buflock); return; } - spin_lock_bh(ax-buflock); memcpy(skb_put(skb,count), ax-rbuff, count); - spin_unlock_bh(ax-buflock); skb-protocol = ax25_type_trans(skb, ax-dev); netif_rx(skb); ax-dev-last_rx = jiffies; ax-stats.rx_packets++; ax-stats.rx_bytes += count; + spin_unlock_bh(ax-buflock); } static void kiss_unesc(struct mkiss *ax, unsigned char s) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][NET_SCHED] sch_tree_lock in cbq_put, hfsc_put_class and htb_put
This patch is wrong - don't apply, please! Sorry, Jarek P. On 03-02-2008 15:52, Jarek Poplawski wrote: [NET_SCHED] sch_tree_lock in cbq_put, hfsc_put_class and htb_put Qdisc_class_ops -put() methods call xxx_destroy_class() functions without sch_tree_lock(), which is needed at least for qdisc_destroy() of a subqueue, but also for deactivating and list/rb_tree updates in case of HTB. (Since errors caused by such a bug could be very hard to reproduce the effectiveness of this patch wasn't confirmed by tests, but IMHO the need of this lock is quite obvious here.) Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] ... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][NET_SCHED] sch_tree_lock in cbq_put, hfsc_put_class and htb_put
[NET_SCHED] sch_tree_lock in cbq_put, hfsc_put_class and htb_put Qdisc_class_ops -put() methods call xxx_destroy_class() functions without sch_tree_lock(), which is needed at least for qdisc_destroy() of a subqueue, but also for deactivating and list/rb_tree updates in case of HTB. (Since errors caused by such a bug could be very hard to reproduce the effectiveness of this patch wasn't confirmed by tests, but IMHO the need of this lock is quite obvious here.) Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- net/sched/sch_cbq.c | 14 -- net/sched/sch_hfsc.c |5 - net/sched/sch_htb.c |5 - 3 files changed, 16 insertions(+), 8 deletions(-) diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index 09969c1..bbf47fe 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -1748,16 +1748,18 @@ static void cbq_put(struct Qdisc *sch, unsigned long arg) struct cbq_class *cl = (struct cbq_class*)arg; if (--cl-refcnt == 0) { + sch_tree_lock(sch); #ifdef CONFIG_NET_CLS_ACT - struct cbq_sched_data *q = qdisc_priv(sch); + { + struct cbq_sched_data *q = qdisc_priv(sch); - spin_lock_bh(sch-dev-queue_lock); - if (q-rx_class == cl) - q-rx_class = NULL; - spin_unlock_bh(sch-dev-queue_lock); + if (q-rx_class == cl) + q-rx_class = NULL; + } #endif cbq_destroy_class(sch, cl); + sch_tree_unlock(sch); } } @@ -1965,10 +1967,10 @@ static int cbq_delete(struct Qdisc *sch, unsigned long arg) cbq_sync_defmap(cl); cbq_rmprio(q, cl); - sch_tree_unlock(sch); if (--cl-refcnt == 0) cbq_destroy_class(sch, cl); + sch_tree_unlock(sch); return 0; } diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c index 87293d0..494d042 100644 --- a/net/sched/sch_hfsc.c +++ b/net/sched/sch_hfsc.c @@ -1262,8 +1262,11 @@ hfsc_put_class(struct Qdisc *sch, unsigned long arg) { struct hfsc_class *cl = (struct hfsc_class *)arg; - if (--cl-refcnt == 0) + if (--cl-refcnt == 0) { + sch_tree_lock(sch); hfsc_destroy_class(sch, cl); + sch_tree_unlock(sch); + } } static unsigned long diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c index e1a579e..268a2d1 100644 --- a/net/sched/sch_htb.c +++ b/net/sched/sch_htb.c @@ -1305,8 +1305,11 @@ static void htb_put(struct Qdisc *sch, unsigned long arg) { struct htb_class *cl = (struct htb_class *)arg; - if (--cl-refcnt == 0) + if (--cl-refcnt == 0) { + sch_tree_lock(sch); htb_destroy_class(sch, cl); + sch_tree_unlock(sch); + } } static int htb_change_class(struct Qdisc *sch, u32 classid, -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kernel panic on 2.6.24 with esfq patch applied
Denys Fedoryshchenko wrote, On 02/01/2008 02:25 PM: Hi Probably bug related to ESFQ, now i will unload module and will test more. But probably not related, so if not difficult, please take a look. Hi, The main break seems to take place in HTB. Do you use HTB on IFB, BTW? If not, maybe some exemplary rules from your script? (There are quite a few modules loaded...) Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] IPROUTE2: Add addrlabel subsystem.
YOSHIFUJI Hideaki / 吉藤英明 wrote, On 01/31/2008 08:57 PM: Signed-off-by: YOSHIFUJI Hideaki [EMAIL PROTECTED] --- include/linux/if_addrlabel.h | 32 + ip/Makefile |2 +- ip/ip.c |5 +- ip/ip_common.h |4 + ip/ipaddrlabel.c | 260 ++ ip/ipmonitor.c |4 + 6 files changed, 304 insertions(+), 3 deletions(-) diff --git a/include/linux/if_addrlabel.h b/include/linux/if_addrlabel.h new file mode 100644 index 000..9fe79c9 --- /dev/null +++ b/include/linux/if_addrlabel.h @@ -0,0 +1,32 @@ +/* + * if_addrlabel.h - netlink interface for address labels + * + * Copyright (C)2007 USAGI/WIDE Project, All Rights Reserved. + * Hi, Is this statement GPL compatible? And, maybe you could add a few words about this to man ip too? Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 2.6.24] fib: fix route replacement, fib_info is shared
On Sat, Feb 02, 2008 at 12:56:08PM +0200, Julian Anastasov wrote: Hello, Hi! On Tue, 29 Jan 2008, Jarek Poplawski wrote: ...On the other hand, I wonder how bad would be switching these two to avoid this error? After all replace with this add or change meaning looks quite permissive, and after all it was used before with no such errors, so, even if correct, it could still break some scripts... Agreed, what could break are scripts that add 2 equal alternative routes and checking for errors. BTW, I would be glad if you could use this in some patch (unless you have no time - then let me know)... BTW, I tried to add more information and that is what I have finally: http://www.ssi.bg/~ja/fib.txt Great! But, I hope more people will know about this if you send it as a patch with a new thread. BTW#2: I've thought it maybe needs a bit of cosmetcs like more uniform TOS or tos and . or no . after paragraphs, or more consistent wrapping, but then I've exagerated with this for sure. Anyway, below are some suggestions, but feel free to skip them all! Thanks, Jarek P. --- fib.txt.orig2008-02-02 15:44:09.0 +0100 +++ fib.txt 2008-02-02 18:57:56.0 +0100 @@ -1,42 +1,55 @@ FIB - Forwarding Information Base -- Routes are organized in routing tables +- Routes are organized in routing tables. + - For fib_hash algorithm routing tables have 33 zones (for prefix -lengths 0..32), routing lookup walks them from 32 to 0 to find a -node containing all routing information -- Zones are implemented as hash tables where nodes are hashed by -key (prefix=network) because there can be lots of prefixes in a zone. +lengths 0..32), routing lookup walks them from 32 to 0 to find a node +containing all routing information. + +- Zones are implemented as hash tables where nodes are hashed by key +(prefix = network) because there can be lots of prefixes in a zone. + - Nodes can be stored with other methods, eg. trie, where nodes are -searched (we hope faster) by prefix and length, no zones are used -in this case -- Nodes have a list of aliases (tos+type+scope+fib_info ptr) sorted by -decreasing TOS because TOS=0 must be a last hit when looking for route, -TOS 0 matches packet with any TOS. type is unicast, local, prohibit, etc. -scope is host, link, etc. Additionally, aliases with same TOS are -sorted by fib_info priority (ascending). -- fib_info is a structure containing protocol (kernel, boot, zebra, etc), -prefsrc, priority (metric), metrics, nexthop(s). Fallback routes have -higher value for priority, they are used if more priority routes -disappear or their nexthops are dead. -- fib_info structures are organized in 2 global hash tables, one -keyed by prefsrc and another by nexthop_count+protocol+prefsrc+priority -- fib_info is a shared structure, different aliases can point to same -fib_info, even aliases from different prefixes, from different routing -tables. By this way if fib_info contains multipath route then many -aliases share same route path scheduling context. -- Nexthop contains gateway, output device, scope and weight. Weight +searched (we hope faster) by prefix and length; no zones are used in +this case. + +- Nodes have a list of aliases (tos + type + scope + fib_info ptr) +sorted by decreasing tos because tos = 0 must be the last hit when +looking for a route (tos = 0 matches packet with any tos); type is: +unicast, local, prohibit, etc.; scope is: host, link, etc. +Additionally, aliases with the same tos are sorted by fib_info +priority (ascending). + +- fib_info is a structure containing: protocol (kernel, boot, zebra, +etc.), prefsrc, priority (metric), metrics, nexthop(s). Fallback +routes have higher value for priority; they are used if routes with +more priority disappear or their nexthops are dead. + +- fib_info structures are organized in 2 global hash tables, one keyed +by prefsrc, and another by: nexthop_count + protocol + prefsrc + +priority. + +- fib_info structure is shared: different aliases can point at the +same fib_info, even aliases with different prefixes or from different +routing tables. This way if fib_info contains multipath route, then +many aliases share the same route path scheduling context. + +- Nexthop contains: gateway, output device, scope and weight; weight is used for path scheduling where nexthops have relative priority -compared to other nexthops in multipath route. -- There can be many aliases with same tos, there can be alternative -routes (aliases) with same tos and priority (metric) but only one alias -with particular tos, type, scope and fib_info can exist to avoid duplicate -alternative routes. -- The operation to replace route includes replacing of alias. The alias -in node (table - prefix/len) is matched by tos and fib_info priority and -they can not be changed. The parameters that are changed are type, scope -and fib_info (except priority). -- The 'ip' tool maps route operations to NLM_F_
Re: kernel panic on 2.6.24 with esfq patch applied
Corey Hickey wrote, On 02/02/2008 11:23 AM: ... I'd rather you were using my recent patches to SFQ instead of ESFQ. I was able to crash a 2.6.24 user-mode Linux with ESFQ as well; I don't know if that's what you encountered, but the SFQ patches should be better anyway. http://fatooh.org/esfq-2.6/sfq-2.6.24.tar.bz2 http://fatooh.org/esfq-2.6/ Hi Corey! I've just had a look at your site, and see you're a bit disappointed about the reception of your patches here. I don't use nor know about your work (maybe some time...), and wish you better public here, but maybe you'll find this story interesting: a few years ago, when I didn't even think of reading netdev, I quite often visited pages of such projects as HTB (not in every distro yet), IMQ or Julian Anastasov's. They were quite popular subjects on admins lists, and I really admired people around these projects, while the main kernel was something big and anonymous (no idea e.g. about D. Miller). For the same reason I knew more about people from iptables project - only because there was a need to visit this for some non standard modules, and about A. Kuznetsov - only because of patching iproute for HTB!!! (Only later I found that Alexey did more than needed for several such HTBs.) So, it seems there could be good side of such unofficial status too - if you only have something really useful for others... Cheers, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Disable TSO for non standard qdiscs
On Fri, Feb 01, 2008 at 01:28:15AM -0800, Waskiewicz Jr, Peter P wrote: ... The TCP layer will generate TSO packets based on the kernel socket features associated with the flow. So if you have two devices, one supporting TSO, the other not, then the flows associated with the non-TSO device will not have their packets built for TSO. This has no bearing on the device supporting TSO, which its feature flags will propogate into the kernel socket for that flow, and cause any TCP flows to that device to be TSO packets. So in a nutshell, disabling TSO is on a per-device level, not a global switch. Fine, but I was rather wondering if there could be something more in the idea of this patch that can't be done with ethtool. And I don't think qdisc code currently treats or should treat TCP special. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Disable TSO for non standard qdiscs
jamal wrote, On 02/01/2008 01:06 PM: On Fri, 2008-01-02 at 10:56 +0100, Patrick McHardy wrote: We don't want to disable TSO for cases where it makes sense, but who is using TBF on 10GbE? The point is that most users of qdiscs which are incapable of dealing with TSO without hacks or special configuration probably don't care, and 10GbE users know about ethtool *and* don't use TBF or HTB (which are probably the only qdiscs which actually have problems, maybe also CBQ). Right - Essentially it is a usability issue: People who know how to use TSO (Peter for example) will be clueful enough to turn it on. Which means the default should be to protect the clueless and turn it off. On Andis approach: Turning TSO off at netdev registration time with a warning will be a cleaner IMO. Or alternatively introducing a kernel-config I know what TSO is option which is then used at netdev registration. From a usability perspective it would make more sense to just keep ethtool as the only way to configure TSO. [I recently spent a few days helping someone debug a problem with IFB because he was redirecting packets from an TSO netdevice and occasionaly some multi-packet will be missed in the calculation; my answer was turn off TSO; so there are more use cases for this TSO issue]. I totally disagree with these POVs: - 10G cards should be treated by default as 10G cards - not DSL modems, and common users shouldn't have to read any warnings or configs to see this. - tc with TBF or HTB are professional tools; there should be added some warnings to manuals. But trying to change the way they work because we think we know better what users want, and changing BTW some other things (making debugging this later a hell), is simply disrespectful for target users of these tools. There are some wrappers or creators invented for this. And, BTW, I think I've seen somewhere a system which does this this other way - with creators for professionals. So, you could be right with this too... Cheers, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Disable TSO for non standard qdiscs
Andi Kleen wrote, On 01/31/2008 08:34 PM: TSO by nature is bursty. But disabling TSO without the option of having it on or off to me seems to aggressive. If someone is using a qdisc that TSO is interfering with the effectiveness of the traffic shaping, then they should turn off TSO via ethtool on the target device. Some The philosophical problem I have with this suggestion is that I expect that the large majority of users will be more happy with disabled TSO if they use non standard qdiscs and defaults that do not fit the majority use case are bad. If you mean the large majority of the large minority of users, who use non standard qdiscs - I agree - this is really the philosophical problem! Basically you're suggesting that nearly everyone using tc should learn about another obscure command. ...So, it sounds like tc is used by nearly everyone now... It seems my distro really isn't up to date: Package: iproute ... Description: Professional tools to control the networking in Linux kernels This is `iproute', the professional set of tools to control the networking behavior in kernels 2.2.x and later. And ethtool doesn't have to be learnt at all: most friendly distros could use this in config or add some graphical wrapper. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Disable TSO for non standard qdiscs
Jarek Poplawski wrote, On 01/31/2008 09:33 PM: Andi Kleen wrote, On 01/31/2008 08:34 PM ... Basically you're suggesting that nearly everyone using tc should learn about another obscure command ...On the other hand, with this DSL argument from the sub-thread you could be quite right: if this everyone wants to use one NIC for both high speed local network and such a DSL, then learning ethtool could be not enough... Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Disable TSO for non standard qdiscs
On 01-02-2008 00:04, Jarek Poplawski wrote: ... ...On the other hand, with this DSL argument from the sub-thread you could be quite right: if this everyone wants to use one NIC for both high speed local network and such a DSL, then learning ethtool could be not enough... ...But, on the other hand, in this case the realization seems to be wrong: probably still all locally created packets will be treated the same - or I miss something? Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 2.6.24] fib: fix route replacement, fib_info is shared
On Tue, Jan 29, 2008 at 02:30:47AM +0200, Julian Anastasov wrote: Hello, On Mon, 28 Jan 2008, Jarek Poplawski wrote: BTW, the way add works wasn't questioned now, but it seems could be, or man ip should call it e.g. ip route add - add new destination, and append ip route append (unless I have old man). add is similar to prepend, only that checks NLM_F_EXCL to avoid many alternative routes, only one route for tos+priority is allowed for add. As for the sorting by tos/priority and the insertion position I remember for thread on this issue: Yes but I've meant this man (8) ip description could be confusing: ip route add - add new route ip route change - change route ip route replace - change or add new one One can wonder why there is an error while adding two new routes with add, but OK to do the same with replace. But, it's man problem of course. http://marc.info/?t=10961429062r=1w=2 + fa = list_entry(fa-fa_list.prev, struct fib_alias, fa_list); + list_for_each_entry_continue(fa, f-fn_alias, fa_list) { + if (fa-fa_tos != tos) + break; + if (fa-fa_info-fib_priority != fi-fib_priority) + break; + if (fa-fa_type == cfg-fc_type + fa-fa_scope == cfg-fc_scope + fa-fa_info == fi) { + fa_match = fa; + break; Why can't we try goto out from here? (less reading...) The case with NLM_F_REPLACE needs to check more things, one day one can combine NLM_F_APPEND+NLM_F_REPLACE to replace last alternative route, not only the first one as currently implemented. I'm not sure I can understand your point: do you mean there could be something to do for these options is fa_match exists? But, maybe it relates to my question below... + fa = fa_first; + if (fa_match) { + if (fa == fa_match) + err = 0; Could you comment more why returning an error seems to depend on the order of aliases here? But, IMHO there is no reason to change the old behavior WRT this error, so probably this err = 0 should be always if NLM_F_REPLACE is set. fa_match is some existing alias that matches all new parameters. As NLM_F_REPLACE changes the first alternative route for tos+priority if fa_match == fa_first (we are replacing alias that matches all parameters) we return 0, only that routing cache is not flushed - nothing is replaced/changed. So, fa == fa_match means replace will not change existing parameters, return 0 as this is not an error. Probably I miss something, but what parameters do we change if (fa_match) (fa != fa_match)? Isn't this goto out in any case? PS: I think, this FIB info you sent earlier is just fine for Documentation/networking without any changes! (Maybe one more patch?) This is so small part of the picture, such 15-minute listing is not enough to explain everything. May be such and other information can be added as part of some known documentation. This is small but condensed, and IMHO it fits very well what is now in this directory. Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 2.6.24] fib: fix route replacement, fib_info is shared
On Tue, Jan 29, 2008 at 09:49:15AM +0100, Jarek Poplawski wrote: On Tue, Jan 29, 2008 at 02:30:47AM +0200, Julian Anastasov wrote: ... fa_match is some existing alias that matches all new parameters. As NLM_F_REPLACE changes the first alternative route for tos+priority if fa_match == fa_first (we are replacing alias that matches all parameters) we return 0, only that routing cache is not flushed - nothing is replaced/changed. So, fa == fa_match means replace will not change existing parameters, return 0 as this is not an error. Probably I miss something, but what parameters do we change if (fa_match) (fa != fa_match)? Isn't this goto out in any case? OOPS! You mean change is needed, but we can't do this! (I'm so slow...) All correct! Thanks again, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 2.6.24] fib: fix route replacement, fib_info is shared
On Tue, Jan 29, 2008 at 10:10:30AM +0100, Jarek Poplawski wrote: On Tue, Jan 29, 2008 at 09:49:15AM +0100, Jarek Poplawski wrote: On Tue, Jan 29, 2008 at 02:30:47AM +0200, Julian Anastasov wrote: ... fa_match is some existing alias that matches all new parameters. As NLM_F_REPLACE changes the first alternative route for tos+priority if fa_match == fa_first (we are replacing alias that matches all parameters) we return 0, only that routing cache is not flushed - nothing is replaced/changed. So, fa == fa_match means replace will not change existing parameters, return 0 as this is not an error. Probably I miss something, but what parameters do we change if (fa_match) (fa != fa_match)? Isn't this goto out in any case? OOPS! You mean change is needed, but we can't do this! (I'm so slow...) ...On the other hand, I wonder how bad would be switching these two to avoid this error? After all replace with this add or change meaning looks quite permissive, and after all it was used before with no such errors, so, even if correct, it could still break some scripts... Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 2.6.24] fib: fix route replacement, fib_info is shared
On 28-01-2008 00:20, Jarek Poplawski wrote: Hi, I have a few questions below: Julian Anastasov wrote, On 01/26/2008 01:41 PM: ... --- linux-2.6.24/net/ipv4/fib_hash.c_orig2008-01-25 10:45:06.0 +0200 +++ linux-2.6.24/net/ipv4/fib_hash.c 2008-01-26 14:11:34.0 +0200 @@ -434,19 +434,43 @@ static int fn_hash_insert(struct fib_tab if (fa fa-fa_tos == tos fa-fa_info-fib_priority == fi-fib_priority) { ...One more doubt here. Your FIB description doesn't say about this, and a code at the end of this function, where a new alias is inserted, doesn't seem to show this too. Are these aliases in the node sorted by fib_priority too? I mean, isn't it possible here, that we got fa from fib_node_alias() with right tos but greater fib_priority, but there is a better match (with right priority) later on the list yet? (The comment above this reads something else, but I'd be glad if you could confirm this.) Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 2.6.24] fib: fix route replacement, fib_info is shared
On Mon, Jan 28, 2008 at 09:33:02AM +0100, Jarek Poplawski wrote: ... from fib_node_alias() with right tos but greater fib_priority, but from fib_find_alias() of course... arek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 2.6.24] fib: fix route replacement, fib_info is shared
On Mon, Jan 28, 2008 at 09:33:02AM +0100, Jarek Poplawski wrote: ... [...] Are these aliases in the node sorted by fib_priority too? [...] OK, I see they are! Sorry for bothering, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9816] New: cannot replace route
On Sun, Jan 27, 2008 at 11:49:06AM +0200, Julian Anastasov wrote: ... No, simply the last change in 2.6.24 is wrong to assume duplication is evident in fib_info reference counter. And such check is only on ip route replace/change. I'm appending brief FIB information for your reference: Thank you very much! I've just planned to look more at this code today, and maybe try to find how your new patch copes with this problem, so this description should be really helpful! But, since we agree now it's a bug and regression in 2.6.24, it seems this should be fixed as soon as possible, and since your patch isn't probably tested enough for stable, it looks like safe bet to revert Joonwoo's patch, at least until this all is more verified. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 2.6.24] fib: fix route replacement, fib_info is shared
Hi, I have a few questions below: Julian Anastasov wrote, On 01/26/2008 01:41 PM: fib_info can be shared by many route prefixes but we don't want duplicate alternative routes for a prefix+tos+priority. Last change was not correct to check fib_treeref because it accounts usage from other prefixes. Additionally, avoid replacement without error if new route is same, as Joonwoo Park suggests. Signed-off-by: Julian Anastasov [EMAIL PROTECTED] --- --- linux-2.6.24/net/ipv4/fib_hash.c_orig 2008-01-25 10:45:06.0 +0200 +++ linux-2.6.24/net/ipv4/fib_hash.c 2008-01-26 14:11:34.0 +0200 @@ -434,19 +434,43 @@ static int fn_hash_insert(struct fib_tab if (fa fa-fa_tos == tos fa-fa_info-fib_priority == fi-fib_priority) { - struct fib_alias *fa_orig; + struct fib_alias *fa_first, *fa_match; err = -EEXIST; if (cfg-fc_nlflags NLM_F_EXCL) goto out; BTW, the way add works wasn't questioned now, but it seems could be, or man ip should call it e.g. ip route add - add new destination, and append ip route append (unless I have old man). + /* We have 2 goals: + * 1. Find exact match for type, scope, fib_info to avoid + * duplicate routes + * 2. Find next 'fa' (or head), NLM_F_APPEND inserts before it + */ + fa_match = NULL; + fa_first = fa; + fa = list_entry(fa-fa_list.prev, struct fib_alias, fa_list); + list_for_each_entry_continue(fa, f-fn_alias, fa_list) { + if (fa-fa_tos != tos) + break; + if (fa-fa_info-fib_priority != fi-fib_priority) + break; + if (fa-fa_type == cfg-fc_type + fa-fa_scope == cfg-fc_scope + fa-fa_info == fi) { + fa_match = fa; + break; Why can't we try goto out from here? (less reading...) + } + } + if (cfg-fc_nlflags NLM_F_REPLACE) { struct fib_info *fi_drop; u8 state; - if (fi-fib_treeref 1) + fa = fa_first; + if (fa_match) { + if (fa == fa_match) + err = 0; Could you comment more why returning an error seems to depend on the order of aliases here? But, IMHO there is no reason to change the old behavior WRT this error, so probably this err = 0 should be always if NLM_F_REPLACE is set. goto out; - + } write_lock_bh(fib_hash_lock); fi_drop = fa-fa_info; fa-fa_info = fi; @@ -469,20 +493,11 @@ static int fn_hash_insert(struct fib_tab * uses the same scope, type, and nexthop * information. */ - fa_orig = fa; - fa = list_entry(fa-fa_list.prev, struct fib_alias, fa_list); - list_for_each_entry_continue(fa, f-fn_alias, fa_list) { - if (fa-fa_tos != tos) - break; - if (fa-fa_info-fib_priority != fi-fib_priority) - break; - if (fa-fa_type == cfg-fc_type - fa-fa_scope == cfg-fc_scope - fa-fa_info == fi) - goto out; - } + if (fa_match) + goto out; + if (!(cfg-fc_nlflags NLM_F_APPEND)) - fa = fa_orig; + fa = fa_first; } err = -ENOENT; Generally this patch looks OK to me. Thanks, Jarek P. PS: I think, this FIB info you sent earlier is just fine for Documentation/networking without any changes! (Maybe one more patch?) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9816] New: cannot replace route
On Sat, Jan 26, 2008 at 02:16:01PM +0900, Joonwoo Park wrote: 2008/1/26, Andrew Morton [EMAIL PROTECTED]: But whatever. It used to work. People's scripts will break. Regression. Also I thought that 'replace with itself' should be error as like Jarek. But it used to work and patch made a regression, it's my bad :( Actually, I don't think 'replace with itself' should be an error. I've only meant that lack of this possibility shouldn't be necessarily seen as error - there could be arguments for both sides. IMHO, there should be simply analyzed pros and cons of doing it in this particular place, so: if there is any gain in doing this, and if possible complications or problems with performance, security etc. don't prevail such a gain. And I don't think a regression argument should be valid at all if there are removed any unlogical, error-prone or otherwise problematic options (I don't know if this is such case), even if they are not obvious bugs - especially if it's still possible to do the same 'proper' way. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9816] New: cannot replace route
On Fri, Jan 25, 2008 at 07:20:26PM -0800, Andrew Morton wrote: ... That's not a very good analogy - the source is a kernel object. A better example would be: linux-2.6.24-rc8: echo foo /tmp/1 echo bar /tmp/2 echo foo /tmp/1 linux-2.6.24: echo foo /tmp/1 echo bar /tmp/2 echo foo /tmp/1 sh: cannot write /tmp/1: Inalid argument Probably you are right. When I have some time to build and boot 2.6.24 at last, I hope I'll find what this example is about... But whatever. It used to work. People's scripts will break. Regression. Do you mean people didn't have to change their scripts before, e.g. from linux 2.0.? Scripts are kind of programs, and both APIs and some 'undocumented' features are achanging... Regards, Jarek P. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9816] New: cannot replace route
On Sat, Jan 26, 2008 at 12:40:36PM +0100, Jarek Poplawski wrote: On Sat, Jan 26, 2008 at 02:16:01PM +0900, Joonwoo Park wrote: 2008/1/26, Andrew Morton [EMAIL PROTECTED]: But whatever. It used to work. People's scripts will break. Regression. Also I thought that 'replace with itself' should be error as like Jarek. But it used to work and patch made a regression, it's my bad :( Actually, I don't think 'replace with itself' should be an error. I've only meant that lack of this possibility shouldn't be necessarily seen as error - there could be arguments for both sides. ...On the other hand, after some re-thinking, I actually think 'replace with itself' should be considered a bug. I wondered about the possible reason of this behaviour in a file system, and it seems replace just means things like overwrite, so old thing is always supposed to be destroyed (of course it's a matter of implementation or conditions in which moment this destruction takes place). So, 'replace with itself' is simply ambiguous: we can always delete the object first, to prepare the place for replacement, and find there is nothing to do after this - and it's probably not what somebody wanted. And, after re-reading this bugzilla report, I'm pretty sure the thing should be done with 'ip route change' (but I didn't check if 2.6.24 knows about this...). Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9816] New: cannot replace route
On Sat, Jan 26, 2008 at 03:10:10PM +0100, Jarek Poplawski wrote: ... [...] so old thing is always supposed to be destroyed (of course it's a matter of implementation or conditions in which moment this destruction takes place). So, 'replace with itself' is simply ambiguous: we can always delete the object first, to prepare the place for replacement, and find there is nothing to do after this - and it's probably not what somebody wanted. As a matter of fact, the moment of destruction doesn't even matter: assuming the replaced thing is destroyed in all 'common' cases, doing this at the end isn't probably wanted as well - and skipping this action makes an exception - what IMHO proves the concept is at least inconsistent. Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9816] New: cannot replace route
On Sat, Jan 26, 2008 at 03:27:00PM +0100, Andreas Schwab wrote: Jarek Poplawski [EMAIL PROTECTED] writes: And, after re-reading this bugzilla report, I'm pretty sure the thing should be done with 'ip route change' (but I didn't check if 2.6.24 knows about this...). $ man ip [...] ip route add - add new route ip route change - change route ip route replace - change or add new one [...] According to this replace should be a superset of change. According to this replace should be ...ambiguous. I could read this my/proper(?) way: ip route replace - change with new one or add new one And ...man could be wrong too after all! (...but not me!) Also, please check out comment#3, it also fails for replacing a route with something different (it's a route to an ipsec tunnel). It all depends on which routes should be considered different (and it should be specified somewhere BTW...). But, I should've added my all reasoning was more about logic, and since in real life change and replace are often equivalent, and iproute is famous from using such equivalents in many places, your claim WRT this man page could be completely right! There should be only considered, if realization of this doesn't imply bugs in some other places, like the one which fix caused this regression. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9816] New: cannot replace route
On Sat, Jan 26, 2008 at 04:19:34PM +0100, Jarek Poplawski wrote: On Sat, Jan 26, 2008 at 03:27:00PM +0100, Andreas Schwab wrote: Jarek Poplawski [EMAIL PROTECTED] writes: And, after re-reading this bugzilla report, I'm pretty sure the thing should be done with 'ip route change' (but I didn't check if 2.6.24 knows about this...). $ man ip [...] ip route add - add new route ip route change - change route ip route replace - change or add new one [...] According to this replace should be a superset of change. According to this replace should be ...ambiguous. I could read this my/proper(?) way: ip route replace - change with new one or add new one And ...man could be wrong too after all! (...but not me!) After some checks it seems man is right - ie. WRT iproute.c! (...hmm?) And you read this right: 'replace should be a superset of change'. Also, please check out comment#3, it also fails for replacing a route with something different (it's a route to an ipsec tunnel). But comment#3 is ambiguous... It looks like you don't want to show us too much... So, apparently you change the route, but it seems this route exists; you have this: 10.0.0.0/8 dev eth0 scope link but probably also something like this: default via 192.168.1.1 dev eth0 src 10.204.0.116 So, I doubt there is any real change attempted here. It looks more like a question if program should allow for changing the form of route entries even if they mean the same, and if this should be reported as error at all? But maybe I miss something... Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Bugme-new] [Bug 9816] New: cannot replace route
On Sun, Jan 27, 2008 at 02:11:26AM +0100, Jarek Poplawski wrote: ... But comment#3 is ambiguous... It looks like you don't want to show us too much... So, apparently you change the route, but it seems this route exists; you have this: 10.0.0.0/8 dev eth0 scope link but probably also something like this: default via 192.168.1.1 dev eth0 src 10.204.0.116 So, I doubt there is any real change attempted here. It looks more like a question if program should allow for changing the form of route entries even if they mean the same, and if this should be reported as error at all? But maybe I miss something... ...But, on the other hand, default route shoudn't affect adding or changing specific routes, so this behaviour is wrong, and IMHO this change should be reverted or fixed. Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fib_trie: rescan if key is lost during dump
On 24-01-2008 22:51, Stephen Hemminger wrote: Normally during a dump the key of the last dumped entry is used for continuation, but since lock is dropped it might be lost. In that case fallback to the old counter based N^2 behaviour. This means the dump will end up skipping some routes which matches what FIB_HASH does. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] ... @@ -1918,35 +1931,37 @@ static int fn_trie_dump(struct fib_table struct leaf *l; struct trie *t = (struct trie *) tb-tb_data; t_key key = cb-args[2]; + int count = cb-args[3]; rcu_read_lock(); Sorry, but I lost the point: is rtnl held or not held here at the moment? If held, how this rcu_read_lock can help? Maybe some additional comment in the code? Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3][NET] gen_estimator: faster gen_kill_estimator
On Mon, Jan 21, 2008 at 04:29:18PM -0800, David Miller wrote: ... Fix this right, make a structure like: struct kernel_gnet_stats_rate_est { struct gnet_stats_rate_est est; void*gen_estimator; } And update all the code as needed. Hi, Here is a patch which uses this idea, I hope mostly as expected. This new structure has to replace the older one in most kernel places. I was uncertain of gnet_stats_copy_rate_est(), the change wasn't necessary here but could be considered for uniformity reasons. Currently it's unchanged. Of course, I admit it's not the last version, and any suggestions are welcomed (including updating to some later version - I see there is some queue waiting with sched changes). Regards, Jarek P. -- (take 3) gen_kill_estimator() is called during qdisc_destroy() with BHs disabled, and each time does searching for each list member. This can block soft interrupts for quite a long time when many classes are used. This patch changes this by storing pointers to internal gen_estimator structures. New kernel_gnet_stats_rate_est structure is created for this as a wrapper around gnet_stats_rate_est being a part of userspace API. Respectively all callers of gen_estimator functions, and structures used by them to store rate_est are modified. This method removes currently possibile registering in gen_estimator the same structures more than once, but it isn't used after all. Thanks to David Miller for pointing errors in first versions of this patch and for suggesting proper solution. Reported-by: Badalian Vyacheslav [EMAIL PROTECTED] Signed-off-by: Jarek Poplawski [EMAIL PROTECTED] --- Documentation/networking/gen_stats.txt | 10 +++-- include/linux/gen_stats.h | 13 ++ include/net/act_api.h |4 +- include/net/gen_stats.h|8 ++-- include/net/netfilter/xt_rateest.h |2 +- include/net/sch_generic.h |2 +- net/core/gen_estimator.c | 65 ++-- net/netfilter/xt_RATEEST.c |4 +- net/netfilter/xt_rateest.c |4 +- net/sched/act_api.c|7 +-- net/sched/act_police.c |7 +-- net/sched/sch_api.c|6 +- net/sched/sch_cbq.c| 10 ++-- net/sched/sch_generic.c|2 +- net/sched/sch_hfsc.c | 10 ++-- net/sched/sch_htb.c| 12 +++--- 16 files changed, 85 insertions(+), 81 deletions(-) diff --git a/Documentation/networking/gen_stats.txt b/Documentation/networking/gen_stats.txt index 70e6275..fc45f94 100644 --- a/Documentation/networking/gen_stats.txt +++ b/Documentation/networking/gen_stats.txt @@ -6,10 +6,12 @@ Statistic counters are grouped into structs: Struct TLV type Description -- gnet_stats_basic TCA_STATS_BASIC Basic statistics -gnet_stats_rate_est TCA_STATS_RATE_ESTRate estimator +gnet_stats_rate_est* TCA_STATS_RATE_ESTRate estimator gnet_stats_queue TCA_STATS_QUEUE Queue statistics none TCA_STATS_APP Application specific +* From v2.6.25 kernel uses internal struct kernel_gnet_stats_rate_est + for gen_estimator functions calls. Collecting: --- @@ -106,9 +108,9 @@ In the kernel when setting up: From now on, every time you dump my_rate_est_stats it will contain up-to-date info. -Once you are done, call gen_kill_estimator(my_basicstats, -my_rate_est_stats) Make sure that my_basicstats and my_rate_est_stats -are still valid (i.e still exist) at the time of making this call. +Once you are done, call gen_kill_estimator(my_rate_est_stats). Make +sure that my_rate_est_stats is still valid (i.e still exists) at the +time of making this call. Authors: diff --git a/include/linux/gen_stats.h b/include/linux/gen_stats.h index 13f4e74..12b76d2 100644 --- a/include/linux/gen_stats.h +++ b/include/linux/gen_stats.h @@ -35,6 +35,19 @@ struct gnet_stats_rate_est __u32 pps; }; +#ifdef __KERNEL__ +/** + * struct kernel_gnet_stats_rate_est - rate estimator wrapper + * @est: rate estimator + * @gen_estimator: internal data + */ +struct kernel_gnet_stats_rate_est +{ + struct gnet_stats_rate_est est; + void*gen_estimator; +}; +#endif + /** * struct gnet_stats_queue - queuing statistics * @qlen: queue length diff --git a/include/net/act_api.h b/include/net/act_api.h index c5ac61a..c02ce27 100644 --- a/include/net/act_api.h +++ b/include/net/act_api.h @@ -18,7 +18,7 @@ struct tcf_common { struct tcf_ttcfc_tm; struct gnet_stats_basic tcfc_bstats; struct gnet_stats_queue tcfc_qstats; - struct gnet_stats_rate_est tcfc_rate_est; + struct
Re: [Bugme-new] [Bug 9816] New: cannot replace route
Andrew Morton wrote, On 01/25/2008 11:26 PM: On Fri, 25 Jan 2008 13:23:49 -0800 (PST) [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=9816 ... I'd agree with Andrea: replacing a route with itself a) used to work and b) should still work (surely)? ...on the other hand: $ touch file1 $ cp file1 file1 cp: `file1' and `file1' are the same file $ mv file1 file1 mv: `file1' and `file1' are the same file and: 'everything' in 'linux' is file... ergo: route cannot replace with itself! Regards, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] fib_trie: rescan if key is lost during dump
On Fri, Jan 25, 2008 at 08:13:15AM -0800, Stephen Hemminger wrote: ... 2. RCU is unnecessary here because of use of RTNL. I am going to defer on this till after #1. That patch is much less important. Thanks! (I've started to suspect another advanced RCU trick already...) Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote: ... Jarek, That looks different from the suggestion from Dave. Hmm..., I'm not sure you mean my or your suggestion here, but you are right anyway... May i throw in another bone? Theoretically i can see why it would be a really bad idea to walk 50K estimators every time you delete one - which is horrible if you are trying to destroy the say 50K of them and gets worse as the number of schedulers with 50K classes goes up. But i am wondering why a simpler list couldnt be walked, meaning: In gen_kill_estimator(), instead of: for (idx=0; idx = EST_MAX_INTERVAL; idx++) { Would deriving a better initial index be a big improvement? for (e = elist[est-interval].list; e; e = e-next) { Maybe I miss something, but there still could be a lot of this walking and IMHO any such longer waiting with BHs disabled is hard to accept with current memory sizes and low-latencies prices. And currently time seems to be even more precious here: RCU can't even free any gen_estimator memory during such large qdisc with classes deletion. Thanks, Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3 v2][NET] gen_estimator: faster gen_kill_estimator
On Tue, Jan 22, 2008 at 08:54:28AM -0500, jamal wrote: On Tue, 2008-22-01 at 13:29 +0100, Jarek Poplawski wrote: On Tue, Jan 22, 2008 at 06:42:07AM -0500, jamal wrote: ... Jarek, That looks different from the suggestion from Dave. Hmm..., I'm not sure you mean my or your suggestion here, but you are right anyway... Your idea to grab a pointer to the estimator so you can quickly find it upon destruction is a good one. Let's say it's quite common in the kernel type of idea... The challenge was not to break the ABI to user space. Dave suggested to use a different struct for the kernel side and maintain the user one as is[1]. Your patch didnt do this, hence my statement;- Sure! David's idea is very good, but before reading his message I decided to try something 'safer'. I simply don't know these structures like you or David, and after these mistakes at the beginning, I tried to avoid the next. But now I see it shouldn't be so hard, and I'll do this David's way, I hope! Maybe I miss something, but there still could be a lot of this walking Indeed, that is possible in the case of many estimators configured with the same interval - because they will all fall in the same table bucket and the idx is not that useful to begin with. I was wrong given the nature of interval - the majority of the users will have an estimator interval of say 1 sec which will put everything in one bucket still. We could introduce a proper index that will allow proper distribution and have that stored by the class. I am not sure i made sense. But you are coding - and your idea sounds better. As a matter of fact, I've considered yesterday some additional hash table too, but wasn't sure it's worth of the savings. It seems there should be considered optimization of these estimator structures yet. But since there are now a few similar reports about misterious problems during deletion of large qdisc, it would be nice to remove main suspects around this at any price... Many thanks for your concern with this! (And, if miss something again, don't be afraid to make it straight clear; I really prefer to know the truth about my mistakes, than polite silence.) Regards, Jarek P. [1] This is _not uncommon_ (note the usage of double negation here for emphasis;-) technique actually; ones that go further for example can be found all over the net/sched code (struct tcf_police vs tc_police) etc. I think David's points were very clear and of course right!; it seems now that only my 'usual' way of friendly joking about this could be more challenge than any double-negation and turns friendly fire insted... -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3][NET] gen_estimator: faster gen_kill_estimator
On Mon, Jan 21, 2008 at 02:35:15AM -0800, David Miller wrote: ... Sorry, this structure is exported to userspace so we can't change it like that. It is an attribute passed over netlink sockets. Thanks for finding this! I'll try to rethink this (especially why my tests seemed to show something could be working there...). Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3][NET] gen_estimator: list_empty() check in est_timer() fixed
On Mon, Jan 21, 2008 at 02:36:32AM -0800, David Miller wrote: ... FWIW I agree that double-negatives are confusing and we should avoid them. Right! No more: CHECKSUM_NONE, SOCK_NOSPACE, IFF_NOARP or KERN_NOTICE! Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3][NET] gen_estimator: list_empty() check in est_timer() fixed
On Mon, Jan 21, 2008 at 03:15:53AM -0800, David Miller wrote: ... Life is difficult sometimes, but that is no excuse to further the pain :-) YES! I've read somewhere about it too! Jarek P. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html