Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: Eric Dumazet [EMAIL PROTECTED] Date: Tue, 11 Sep 2007 14:56:13 +0200 When the periodic IP route cache flush is done (every 600 seconds on default configuration), some hosts suffer a lot and eventually trigger the soft lockup message. dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. Then it rearms a timer to redo the full thing 1/10 s later... The slowdown can last one minute or so, depending on how active are the tcp sessions. This second version of the patch converts the processing from a softirq based one to a workqueue. Even if the list of entries in garbage_list is huge, host is still responsive to softirqs and can make progress. Instead of reseting gc timer to 0.1 second if one entry was freed in a gc run, we do this if more than 10% of entries were freed. I like this patch a lot, some minor fix is needed though: + __builtin_prefetch(next-next, 1, 0); Please use prefetch() instead of a direct explicit call to a gcc-specific routine :-) - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
This looks nice in general, getting things out of softirq context is always good. On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote: #if RT_CACHE_DEBUG = 2 static atomic_t dst_total = ATOMIC_INIT(0); #endif -static unsigned long dst_gc_timer_expires; -static unsigned long dst_gc_timer_inc = DST_GC_MAX; -static void dst_run_gc(unsigned long); +static struct { + spinlock_t lock; + struct dst_entry*list; + unsigned long timer_inc; + unsigned long timer_expires; +} dst_garbage = { + .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock), + .timer_inc = DST_GC_MAX, +}; Can you please et rid of this useless struct? It just complicates the code and means we can't use the proper DEFINE_SPINLOCK initializer. +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task); This should be static. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
On Wed, 12 Sep 2007 02:05:25 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Tue, 11 Sep 2007 14:56:13 +0200 When the periodic IP route cache flush is done (every 600 seconds on default configuration), some hosts suffer a lot and eventually trigger the soft lockup message. dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. Then it rearms a timer to redo the full thing 1/10 s later... The slowdown can last one minute or so, depending on how active are the tcp sessions. This second version of the patch converts the processing from a softirq based one to a workqueue. Even if the list of entries in garbage_list is huge, host is still responsive to softirqs and can make progress. Instead of reseting gc timer to 0.1 second if one entry was freed in a gc run, we do this if more than 10% of entries were freed. I like this patch a lot, some minor fix is needed though: Thank you I also spoted a missing static before DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task); no need to stress Adrian on this :) + __builtin_prefetch(next-next, 1, 0); Please use prefetch() instead of a direct explicit call to a gcc-specific routine :-) Unfortunatly, there is no equivalent for this one. This gives on my Opterons a nice prefetchnta prefetch(addr) is more like __builtin_prefetch(addr, 0, 3) I would like to avoid to zap L2 cache with useless data. __builtin_prefetch() is included from gcc 3.1 (2002), so every platform should support it, as linux-2.6 requires gcc 3.2 at least. I guess you are going to tell me to first publish a patch to lkml :) Thank you Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
On Wed, 12 Sep 2007 11:00:54 +0100 Christoph Hellwig [EMAIL PROTECTED] wrote: This looks nice in general, getting things out of softirq context is always good. I am preparing a patch to net/ipv4/route.c to migrate rt_check_expire() as well. On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote: #if RT_CACHE_DEBUG = 2 static atomic_t dst_total = ATOMIC_INIT(0); #endif -static unsigned long dst_gc_timer_expires; -static unsigned long dst_gc_timer_inc = DST_GC_MAX; -static void dst_run_gc(unsigned long); +static struct { + spinlock_t lock; + struct dst_entry*list; + unsigned long timer_inc; + unsigned long timer_expires; +} dst_garbage = { + .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock), + .timer_inc = DST_GC_MAX, +}; Can you please et rid of this useless struct? It just complicates the code and means we can't use the proper DEFINE_SPINLOCK initializer. When using the standard DEFINE_SPINLOCK initializer, the lock is in the data section, while list is in bss section. This 'useless struct' makes lock/list being on the same cache line, so reduces latency of __dst_free(). I wish more structures in kernel be used instead of relying on random placement of the linker... +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task); This should be static. Yes I agree - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
On Wed, 12 Sep 2007 04:12:00 -0700 (PDT) David Miller [EMAIL PROTECTED] wrote: From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 12:08:45 +0200 Unfortunatly, there is no equivalent for this one. This gives on my Opterons a nice prefetchnta prefetch(addr) is more like __builtin_prefetch(addr, 0, 3) I would like to avoid to zap L2 cache with useless data. __builtin_prefetch() is included from gcc 3.1 (2002), so every platform should support it, as linux-2.6 requires gcc 3.2 at least. I guess you are going to tell me to first publish a patch to lkml :) Basically, yes :-) You won't be the only person to find this useful. OK, let's try a normal prefetch(), I'll change it later when/if a new generic macro is added. I added the missing 'static' and a comment about the struct {} dst_garbage. I also corrected spelling error on patch title (collection) Thank you [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue When the periodic IP route cache flush is done (every 600 seconds on default configuration), some hosts suffer a lot and eventually trigger the soft lockup message. dst_run_gc() is doing a scan of a possibly huge list of dst_entries, eventually freeing some (less than 1%) of them, while holding the dst_lock spinlock for the whole scan. Then it rearms a timer to redo the full thing 1/10 s later... The slowdown can last one minute or so, depending on how active are the tcp sessions. This second version of the patch converts the processing from a softirq based one to a workqueue. Even if the list of entries in garbage_list is huge, host is still responsive to softirqs and can make progress. Instead of resetting gc timer to 0.1 second if one entry was freed in a gc run, we do this if more than 10% of entries were freed. Before patch : Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0! Aug 16 06:21:37 SRV1 kernel: Aug 16 06:21:37 SRV1 kernel: Call Trace: Aug 16 06:21:37 SRV1 kernel: IRQ [802286f0] wake_up_process+0x10/0x20 Aug 16 06:21:37 SRV1 kernel: [80251e09] softlockup_tick+0xe9/0x110 Aug 16 06:21:37 SRV1 kernel: [803cd380] dst_run_gc+0x0/0x140 Aug 16 06:21:37 SRV1 kernel: [802376f3] run_local_timers+0x13/0x20 Aug 16 06:21:37 SRV1 kernel: [802379c7] update_process_times+0x57/0x90 Aug 16 06:21:37 SRV1 kernel: [80216034] smp_local_timer_interrupt+0x34/0x60 Aug 16 06:21:37 SRV1 kernel: [802165cc] smp_apic_timer_interrupt+0x5c/0x80 Aug 16 06:21:37 SRV1 kernel: [8020a816] apic_timer_interrupt+0x66/0x70 Aug 16 06:21:37 SRV1 kernel: [803cd3d3] dst_run_gc+0x53/0x140 Aug 16 06:21:37 SRV1 kernel: [803cd3c6] dst_run_gc+0x46/0x140 Aug 16 06:21:37 SRV1 kernel: [80237148] run_timer_softirq+0x148/0x1c0 Aug 16 06:21:37 SRV1 kernel: [8023340c] __do_softirq+0x6c/0xe0 Aug 16 06:21:37 SRV1 kernel: [8020ad6c] call_softirq+0x1c/0x30 Aug 16 06:21:37 SRV1 kernel: EOI [8020cb34] do_softirq+0x34/0x90 Aug 16 06:21:37 SRV1 kernel: [802331cf] local_bh_enable_ip+0x3f/0x60 Aug 16 06:21:37 SRV1 kernel: [80422913] _spin_unlock_bh+0x13/0x20 Aug 16 06:21:37 SRV1 kernel: [803dfde8] rt_garbage_collect+0x1d8/0x320 Aug 16 06:21:37 SRV1 kernel: [803cd4dd] dst_alloc+0x1d/0xa0 Aug 16 06:21:37 SRV1 kernel: [803e1433] __ip_route_output_key+0x573/0x800 Aug 16 06:21:37 SRV1 kernel: [803c02e2] sock_common_recvmsg+0x32/0x50 Aug 16 06:21:37 SRV1 kernel: [803e16dc] ip_route_output_flow+0x1c/0x60 Aug 16 06:21:37 SRV1 kernel: [80400160] tcp_v4_connect+0x150/0x610 Aug 16 06:21:37 SRV1 kernel: [803ebf07] inet_bind_bucket_create+0x17/0x60 Aug 16 06:21:37 SRV1 kernel: [8040cd16] inet_stream_connect+0xa6/0x2c0 Aug 16 06:21:37 SRV1 kernel: [80422981] _spin_lock_bh+0x11/0x30 Aug 16 06:21:37 SRV1 kernel: [803c0bdf] lock_sock_nested+0xcf/0xe0 Aug 16 06:21:37 SRV1 kernel: [80422981] _spin_lock_bh+0x11/0x30 Aug 16 06:21:37 SRV1 kernel: [803be551] sys_connect+0x71/0xa0 Aug 16 06:21:37 SRV1 kernel: [803eee3f] tcp_setsockopt+0x1f/0x30 Aug 16 06:21:37 SRV1 kernel: [803c030f] sock_common_setsockopt+0xf/0x20 Aug 16 06:21:37 SRV1 kernel: [803be4bd] sys_setsockopt+0x9d/0xc0 Aug 16 06:21:37 SRV1 kernel: [8028881e] sys_ioctl+0x5e/0x80 Aug 16 06:21:37 SRV1 kernel: [80209c4e] system_call+0x7e/0x83 After patch : (RT_CACHE_DEBUG set to 2 to get following traces) dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us
Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue
From: Eric Dumazet [EMAIL PROTECTED] Date: Wed, 12 Sep 2007 14:16:56 +0200 OK, let's try a normal prefetch(), I'll change it later when/if a new generic macro is added. I added the missing 'static' and a comment about the struct {} dst_garbage. I also corrected spelling error on patch title (collection) I sorted out the conflicts with the network namespace stuff I just checked in and added your patch to net-2.6.24 Thanks! - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html