Andrew, thanks. On Mon, 8 Feb 2010, Andrew Morton wrote: > > Hello the problem is that after some time of working my router have some > > performance problems. > > > > The problem exist on kernel 2.6.32.7 and 2.6.33-rc6-git5 > > The problem dont exist on 2.6.32.2 kernel. > > That would be bad. That's a regression in 2.6.33-rc, and we've > backported that regression into 2.6.32.x -stable.
there were no changes to e1000e from v2.6.32..v2.6.32.7 # git whatchanged v2.6.32..v2.6.32.7 drivers/net/e1000e > > on 2.6.33-rc6-git5 i have this info: > > BUG: soft lockup - CPU#1 stuck for 61s! > > [events/1:28] so why is this events/1 thing here? I've see this kernel thread related to ipv6 stack hanging when under heavy incoming packet load as the interface comes up. Was the interface just brought up or something changed in rules or something? > > Modules linked in: > > CPU 1 > > Pid: 28, comm: events/1 Not tainted 2.6.33-rc6-git5 #1 X7DCT/X7DCT > > RIP: 0010:[<ffffffff810a3d89>] [<ffffffff810a3d89>] > > kmem_cache_free+0x11b/0x11c > > RSP: 0018:ffff880028243e50 EFLAGS: 00000292 > > RAX: 0000000000000032 RBX: 000000000000007d RCX: ffff8803190683c0 > > RDX: 0000000000000031 RSI: ffff8803190683c0 RDI: ffff88031f83e680 > > RBP: ffffffff81002893 R08: 0000000000000000 R09: 000000000000007c > > R10: ffff88030d776800 R11: ffff88030d7768a0 R12: ffff880028243dd0 > > R13: ffffc900008b2f80 R14: ffff88031fa7c800 R15: ffffffff81012da7 > > FS: 0000000000000000(0000) GS:ffff880028240000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: 00007fd61d5bd000 CR3: 000000031e55c000 CR4: 00000000000006a0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Process events/1 (pid: 28, threadinfo ffff88031f9c8000, task > > ffff88031f9a4f80) > > Stack: > > ffffffff8126826f ffff88031faa4600 ffffffff8126834a 000096ba00000023 > > <0> 01ffc90000000024 ffff88031fbb4000 ffff88031faa4600 0000000000000040 > > <0> 0000000000000040 ffff88031faa4788 ffff88031faa4600 0000000000000740 > > Call Trace: > > <IRQ> > > [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 > > [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 clean_tx_irq in e1000e has a break-out condition to keep the transmit cleanup for running forever (if another cpu is transmitting) so it seems that something else must have happened. Can kmem_cache_free sleep or stall on a lock? > > [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c > > [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d > > [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 > > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > > <EOI> > > [<ffffffff81004599>] ? do_softirq+0x31/0x63 > > [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 > > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > > [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 > > [<ffffffff8136b08c>] ? schedule+0x82c/0x906 > > [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b > > [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d > > [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc > > [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e > > [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc > > [<ffffffff810479bd>] ? kthread+0x79/0x81 > > [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 > > [<ffffffff81047944>] ? kthread+0x0/0x81 > > [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 > > Code: fe 79 4c 00 48 85 db 74 14 48 8b 74 24 10 48 89 ef ff 13 48 83 c3 08 > > 48 > > 83 3b 00 eb ea 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f <c3> 55 48 89 f5 > > 53 48 > > 89 fb 48 83 ec 08 48 8b 76 18 48 2b 75 10 > > Call Trace: > > <IRQ> [<ffffffff8126826f>] ? e1000_put_txbuf+0x62/0x74 > > [<ffffffff8126834a>] ? e1000_clean_tx_irq+0xc9/0x235 > > [<ffffffff8126b71b>] ? e1000_clean+0x5c/0x21c > > [<ffffffff812f29a3>] ? net_rx_action+0x71/0x15d > > [<ffffffff81035311>] ? __do_softirq+0xd7/0x196 > > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > > [<ffffffff81002dac>] ? call_softirq+0x1c/0x28 > > <EOI> [<ffffffff81004599>] ? do_softirq+0x31/0x63 > > [<ffffffff81034ec1>] ? local_bh_enable_ip+0x75/0x86 > > [<ffffffff812f768f>] ? dst_gc_task+0x0/0x1a7 > > [<ffffffff812f775d>] ? dst_gc_task+0xce/0x1a7 > > [<ffffffff8136b08c>] ? schedule+0x82c/0x906 > > [<ffffffff8103c44f>] ? lock_timer_base+0x26/0x4b > > [<ffffffff810a41d6>] ? cache_reap+0x0/0x11d > > [<ffffffff81044c38>] ? worker_thread+0x14c/0x1dc > > [<ffffffff81047dcd>] ? autoremove_wake_function+0x0/0x2e > > [<ffffffff81044aec>] ? worker_thread+0x0/0x1dc > > [<ffffffff810479bd>] ? kthread+0x79/0x81 > > [<ffffffff81002cb4>] ? kernel_thread_helper+0x4/0x10 > > [<ffffffff81047944>] ? kthread+0x0/0x81 > > [<ffffffff81002cb0>] ? kernel_thread_helper+0x0/0x10 > > > > And on 2.6.32.7 > > Feb 4 13:12:40 TM_01_C1 ------------[ cut here ]------------ > > Feb 4 13:12:40 TM_01_C1 WARNING: at net/sched/sch_generic.c:261 > > dev_watchdog+0x130/0x1d6() > > Feb 4 13:12:40 TM_01_C1 Hardware name: X7DCT > > Feb 4 13:12:40 TM_01_C1 NETDEV WATCHDOG: eth0 (e1000e): transmit queue 0 > > timed > > out > > Feb 4 13:12:40 TM_01_C1 Modules linked in: oprofile > > Feb 4 13:12:40 TM_01_C1 Pid: 0, comm: swapper Not tainted 2.6.32 #1 the above doesn't match your statement that it is 2.6.32.7 > > Feb 4 13:12:40 TM_01_C1 Call Trace: > > Feb 4 13:12:40 TM_01_C1 <IRQ> [<ffffffff812fcaf7>] ? > > dev_watchdog+0x130/0x1d6 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff81038811>] ? > > warn_slowpath_common+0x77/0xa3 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff81038899>] ? warn_slowpath_fmt+0x51/0x59 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff8102897e>] ? activate_task+0x3f/0x4e > > Feb 4 13:12:40 TM_01_C1 [<ffffffff81034fe5>] ? try_to_wake_up+0x1eb/0x1f8 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff812eb768>] ? netdev_drivername+0x3b/0x40 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff812fcaf7>] ? dev_watchdog+0x130/0x1d6 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff8102d1e3>] ? __wake_up+0x30/0x44 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff812fc9c7>] ? dev_watchdog+0x0/0x1d6 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff810448c4>] ? > > run_timer_softirq+0x1ff/0x29d > > Feb 4 13:12:40 TM_01_C1 [<ffffffff810556ab>] ? ktime_get+0x5f/0xb7 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff8103e0fd>] ? __do_softirq+0xd7/0x196 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff8100be7c>] ? call_softirq+0x1c/0x28 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff8100d645>] ? do_softirq+0x31/0x66 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff8101b148>] ? > > smp_apic_timer_interrupt+0x87/0x95 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff8100b873>] ? > > apic_timer_interrupt+0x13/0x20 > > Feb 4 13:12:40 TM_01_C1 <EOI> [<ffffffff810111f5>] ? mwait_idle+0x9b/0xa0 > > Feb 4 13:12:40 TM_01_C1 [<ffffffff8100a236>] ? cpu_idle+0x49/0x7c > > Feb 4 13:12:40 TM_01_C1 ---[ end trace c670a6a17be040e5 ]--- > > > > > > Machine is a BGP router with some iproute traffic management. please send a full dmesg from boot. also full lspci -vvv. We sporadically see some issue with some people's use of traffic management with some e1000/e devices, we have no repro or common reproduction case, so haven't solved the issue yet. I suspect there is a strange interaction with HTB or one of the other queueing disciplines that causes some odd skb or set of skb's to arrive, but that is just a theory. I see two different bugs above, one is the tx hang, could it be that you didn't have SOFTLOCKUP detection enabled on 2.6.32? A softlockup could cause a tx hang. The other bug is the softlockup and I don't know how the driver could be at fault in that listed stack path. We simply can't loop more than tx_ring->count times, which could never take > 60 seconds unless one of the calls we were making was getting stalled for some reason. So your traffic is "heavy" forwarding at the time of the failure, do you have any input on average packet size? I'm trying to figure out how we might reproduce this. Jesse ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired