Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Tue, 11 Sep 2007 14:56:13 +0200

 When the periodic IP route cache flush is done (every 600 seconds on 
 default configuration), some hosts suffer a lot and eventually trigger
 the soft lockup message.
 
 dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
 eventually freeing some (less than 1%) of them, while holding the 
 dst_lock spinlock for the whole scan.
 
 Then it rearms a timer to redo the full thing 1/10 s later...
 The slowdown can last one minute or so, depending on how active are
 the tcp sessions.
 
 This second version of the patch converts the processing from a softirq
 based one to a workqueue.
 
 Even if the list of entries in garbage_list is huge, host is still
 responsive to softirqs and can make progress.
 
 Instead of reseting gc timer to 0.1 second if one entry was freed in a
 gc run, we do this if more than 10% of entries were freed.

I like this patch a lot, some minor fix is needed though:

 + __builtin_prefetch(next-next, 1, 0);

Please use prefetch() instead of a direct explicit
call to a gcc-specific routine :-)
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread Christoph Hellwig
This looks nice in general, getting things out of softirq context is
always good.

On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote:
  #if RT_CACHE_DEBUG = 2
  static atomic_t   dst_total = ATOMIC_INIT(0);
  #endif
 -static unsigned long dst_gc_timer_expires;
 -static unsigned long dst_gc_timer_inc = DST_GC_MAX;
 -static void dst_run_gc(unsigned long);
 +static struct {
 + spinlock_t  lock;
 + struct dst_entry*list;
 + unsigned long   timer_inc;
 + unsigned long   timer_expires;
 +} dst_garbage = {
 + .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock),
 + .timer_inc = DST_GC_MAX,
 +};

Can you please et rid of this useless struct?  It just complicates
the code and means we can't use the proper DEFINE_SPINLOCK initializer.

 +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);

This should be static.

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread Eric Dumazet
On Wed, 12 Sep 2007 02:05:25 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Tue, 11 Sep 2007 14:56:13 +0200
 
  When the periodic IP route cache flush is done (every 600 seconds on 
  default configuration), some hosts suffer a lot and eventually trigger
  the soft lockup message.
  
  dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
  eventually freeing some (less than 1%) of them, while holding the 
  dst_lock spinlock for the whole scan.
  
  Then it rearms a timer to redo the full thing 1/10 s later...
  The slowdown can last one minute or so, depending on how active are
  the tcp sessions.
  
  This second version of the patch converts the processing from a softirq
  based one to a workqueue.
  
  Even if the list of entries in garbage_list is huge, host is still
  responsive to softirqs and can make progress.
  
  Instead of reseting gc timer to 0.1 second if one entry was freed in a
  gc run, we do this if more than 10% of entries were freed.
 
 I like this patch a lot, some minor fix is needed though:

Thank you

I also spoted a missing static before 
DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);
 no need to stress Adrian on this :)

 
  +   __builtin_prefetch(next-next, 1, 0);
 
 Please use prefetch() instead of a direct explicit
 call to a gcc-specific routine :-)

Unfortunatly, there is no equivalent for this one. 
This gives on my Opterons a nice prefetchnta

prefetch(addr) is more like __builtin_prefetch(addr, 0, 3)

I would like to avoid to zap L2 cache with useless data.

__builtin_prefetch() is included from gcc 3.1 (2002), so every 
platform should support it, as linux-2.6 requires gcc 3.2 at least.

I guess you are going to tell me to first publish a patch to lkml :)

Thank you

Eric
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread Eric Dumazet
On Wed, 12 Sep 2007 11:00:54 +0100
Christoph Hellwig [EMAIL PROTECTED] wrote:

 This looks nice in general, getting things out of softirq context is
 always good.

I am preparing a patch to net/ipv4/route.c to migrate rt_check_expire()
as well.

 
 On Tue, Sep 11, 2007 at 02:56:13PM +0200, Eric Dumazet wrote:
   #if RT_CACHE_DEBUG = 2
   static atomic_t dst_total = ATOMIC_INIT(0);
   #endif
  -static unsigned long dst_gc_timer_expires;
  -static unsigned long dst_gc_timer_inc = DST_GC_MAX;
  -static void dst_run_gc(unsigned long);
  +static struct {
  +   spinlock_t  lock;
  +   struct dst_entry*list;
  +   unsigned long   timer_inc;
  +   unsigned long   timer_expires;
  +} dst_garbage = {
  +   .lock = __SPIN_LOCK_UNLOCKED(dst_garbage.lock),
  +   .timer_inc = DST_GC_MAX,
  +};
 
 Can you please et rid of this useless struct?  It just complicates
 the code and means we can't use the proper DEFINE_SPINLOCK initializer.

When using the standard DEFINE_SPINLOCK initializer, the lock is in the 
data section, while list is in bss section.

This 'useless struct' makes lock/list being on the same cache line, so 
reduces latency of __dst_free(). I wish more structures in kernel be used
instead of relying on random placement of the linker...

 
  +DECLARE_DELAYED_WORK(dst_gc_work, dst_gc_task);
 
 This should be static.

Yes I agree

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread Eric Dumazet
On Wed, 12 Sep 2007 04:12:00 -0700 (PDT)
David Miller [EMAIL PROTECTED] wrote:

 From: Eric Dumazet [EMAIL PROTECTED]
 Date: Wed, 12 Sep 2007 12:08:45 +0200
 
  Unfortunatly, there is no equivalent for this one. 
  This gives on my Opterons a nice prefetchnta
  
  prefetch(addr) is more like __builtin_prefetch(addr, 0, 3)
  
  I would like to avoid to zap L2 cache with useless data.
  
  __builtin_prefetch() is included from gcc 3.1 (2002), so every 
  platform should support it, as linux-2.6 requires gcc 3.2 at least.
  
  I guess you are going to tell me to first publish a patch to lkml :)
 
 Basically, yes :-)  You won't be the only person to find this
 useful.

OK, let's try a normal prefetch(), I'll change it later when/if a 
new generic macro is added. I added the missing 'static' and a comment
about the struct {} dst_garbage. I also corrected spelling error on
patch title (collection)

Thank you

[PATCH] NET : convert IP route cache garbage collection from softirq processing 
to a workqueue

When the periodic IP route cache flush is done (every 600 seconds on 
default configuration), some hosts suffer a lot and eventually trigger
the soft lockup message.

dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
eventually freeing some (less than 1%) of them, while holding the 
dst_lock spinlock for the whole scan.

Then it rearms a timer to redo the full thing 1/10 s later...
The slowdown can last one minute or so, depending on how active are
the tcp sessions.

This second version of the patch converts the processing from a softirq
based one to a workqueue.

Even if the list of entries in garbage_list is huge, host is still
responsive to softirqs and can make progress.

Instead of resetting gc timer to 0.1 second if one entry was freed in a
gc run, we do this if more than 10% of entries were freed.


Before patch :

Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
Aug 16 06:21:37 SRV1 kernel: 
Aug 16 06:21:37 SRV1 kernel: Call Trace:
Aug 16 06:21:37 SRV1 kernel:  IRQ  [802286f0] 
wake_up_process+0x10/0x20
Aug 16 06:21:37 SRV1 kernel:  [80251e09] softlockup_tick+0xe9/0x110
Aug 16 06:21:37 SRV1 kernel:  [803cd380] dst_run_gc+0x0/0x140
Aug 16 06:21:37 SRV1 kernel:  [802376f3] run_local_timers+0x13/0x20
Aug 16 06:21:37 SRV1 kernel:  [802379c7] 
update_process_times+0x57/0x90
Aug 16 06:21:37 SRV1 kernel:  [80216034] 
smp_local_timer_interrupt+0x34/0x60
Aug 16 06:21:37 SRV1 kernel:  [802165cc] 
smp_apic_timer_interrupt+0x5c/0x80
Aug 16 06:21:37 SRV1 kernel:  [8020a816] 
apic_timer_interrupt+0x66/0x70
Aug 16 06:21:37 SRV1 kernel:  [803cd3d3] dst_run_gc+0x53/0x140
Aug 16 06:21:37 SRV1 kernel:  [803cd3c6] dst_run_gc+0x46/0x140
Aug 16 06:21:37 SRV1 kernel:  [80237148] run_timer_softirq+0x148/0x1c0
Aug 16 06:21:37 SRV1 kernel:  [8023340c] __do_softirq+0x6c/0xe0
Aug 16 06:21:37 SRV1 kernel:  [8020ad6c] call_softirq+0x1c/0x30
Aug 16 06:21:37 SRV1 kernel:  EOI  [8020cb34] do_softirq+0x34/0x90
Aug 16 06:21:37 SRV1 kernel:  [802331cf] local_bh_enable_ip+0x3f/0x60
Aug 16 06:21:37 SRV1 kernel:  [80422913] _spin_unlock_bh+0x13/0x20
Aug 16 06:21:37 SRV1 kernel:  [803dfde8] 
rt_garbage_collect+0x1d8/0x320
Aug 16 06:21:37 SRV1 kernel:  [803cd4dd] dst_alloc+0x1d/0xa0
Aug 16 06:21:37 SRV1 kernel:  [803e1433] 
__ip_route_output_key+0x573/0x800
Aug 16 06:21:37 SRV1 kernel:  [803c02e2] sock_common_recvmsg+0x32/0x50
Aug 16 06:21:37 SRV1 kernel:  [803e16dc] 
ip_route_output_flow+0x1c/0x60
Aug 16 06:21:37 SRV1 kernel:  [80400160] tcp_v4_connect+0x150/0x610
Aug 16 06:21:37 SRV1 kernel:  [803ebf07] 
inet_bind_bucket_create+0x17/0x60
Aug 16 06:21:37 SRV1 kernel:  [8040cd16] 
inet_stream_connect+0xa6/0x2c0
Aug 16 06:21:37 SRV1 kernel:  [80422981] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel:  [803c0bdf] lock_sock_nested+0xcf/0xe0
Aug 16 06:21:37 SRV1 kernel:  [80422981] _spin_lock_bh+0x11/0x30
Aug 16 06:21:37 SRV1 kernel:  [803be551] sys_connect+0x71/0xa0
Aug 16 06:21:37 SRV1 kernel:  [803eee3f] tcp_setsockopt+0x1f/0x30
Aug 16 06:21:37 SRV1 kernel:  [803c030f] 
sock_common_setsockopt+0xf/0x20
Aug 16 06:21:37 SRV1 kernel:  [803be4bd] sys_setsockopt+0x9d/0xc0
Aug 16 06:21:37 SRV1 kernel:  [8028881e] sys_ioctl+0x5e/0x80
Aug 16 06:21:37 SRV1 kernel:  [80209c4e] system_call+0x7e/0x83

After patch : (RT_CACHE_DEBUG set to 2 to get following traces)

dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us
dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us
dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us
dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us
dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us

Re: [PATCH] NET : convert IP route cache garbage colleciton from softirq processing to a workqueue

2007-09-12 Thread David Miller
From: Eric Dumazet [EMAIL PROTECTED]
Date: Wed, 12 Sep 2007 14:16:56 +0200

 OK, let's try a normal prefetch(), I'll change it later when/if a 
 new generic macro is added. I added the missing 'static' and a comment
 about the struct {} dst_garbage. I also corrected spelling error on
 patch title (collection)

I sorted out the conflicts with the network namespace stuff
I just checked in and added your patch to net-2.6.24

Thanks!
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html