Hi Peter, So I have a couple of questions. Does this happen with a non-preemptive kernel? I understand that you probably need to use a preemptive kernel but for testing purposes it would be good to know. We don't always test with preemptive kernels.
When doing the up/down transitions is there system under test? I mean sending and receiving packets? If it is what is the load like? Does changing the load make a difference? Does stopping the network traffic first make a difference in the outcome? Please let us know. Cheers, John > -----Original Message----- > From: Peter LaDow [mailto:[email protected]] > Sent: Wednesday, June 05, 2013 3:02 PM > To: [email protected] > Subject: Re: [E1000-devel] Memory Corruption with e1000 > > After some more digging, I'm wondering if this is indeed a timing > issue. Is there a problem with bringing up an interface too soon after > taking it down? If I change my loop to use a 30 second delay between > interface bringup/teardown, I don't get the panic. > > It appears that upon a change in adapter state, the netlink interface > returns nearly immediately. I wrote some test code to change IFF_UP > via netlink, and timed how long it took to get the flag to actually > change (i.e. sending the request, waiting for the ack, and updating the > cache to indicatge the flag was changed). I noted that from down to up > was about 21us. And up to down took about 38us. And even with this > test code, setting the delay between up/down transitions too small > (less than 30 seconds) I would get the panic. > > So. It seems to me there is some issue with the up/down transition. > Any insight would be very helpful. > > Thanks, > Pete > > On Wed, Jun 5, 2013 at 9:03 AM, Peter LaDow <[email protected]> > wrote: > > We are running a PPC system with an 82540EP that is causing kernel > > panics when there is heavy traffic and the interface is brought up > > and/or down (we aren't sure which yet). > > > > We are running 3.0.57-rt82, but we can re-create this issue reliably > > with 3.0.80 and 3.0.80-rt109 with the base version included in the > > kernel (which is 7.3.21-k8-NAPI). However, I've also tried 8.0.35, > > and get the same failure. > > > > We've narrowed it down to this case and can reliably re-create the > > issue with a tight loop, such as: > > > > while : > > do > > ip link set eth2 up > > sleep 10 > > ip link set eth2 down > > sleep 10 > > done > > > > I'm not sure where to look and any help would be appreciated. > > > > In this loop we can reliably generate a kernel panic such as: > > > > Unable to handle kernel paging request for data at address 0x20454a46 > > Faulting instruction address: 0xc0069924 > > Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT PPC Platform > > Modules linked in: > > NIP: c0069924 LR: c021cce0 CTR: c000cecc > > REGS: ed4f1c60 TRAP: 0300 Not tainted (3.0.80-rt108) > > MSR: 00009032 <EE,ME,IR,DR> CR: 24008248 XER: 00000000 > > DAR: 20454a46, DSISR: 20000000 > > TASK = eda46780[3106] 'ifconfig' THREAD: ed4f0000 > > GPR00: 00000000 ed4f1d10 eda46780 20454a46 2d6fcc2a 000005f2 00000002 > > 00000000 > > GPR08: eda46780 ed6fd228 ed4f1cd0 000090b1 00000000 10084718 bfcceaec > > 10062044 > > GPR16: 10062120 bfcceadc 00000000 bfcceac4 00000228 00000000 00008914 > > c01ac398 > > GPR24: c01ac8c8 ed066520 00000061 ed0663a0 ef0448f0 00000000 00000001 > > ed575580 NIP [c0069924] put_page+0x0/0x34 LR [c021cce0] > > skb_release_data+0x78/0xc8 Call Trace: > > [ed4f1d20] [c021c914] __kfree_skb+0x18/0xbc [ed4f1d30] [c01a7620] > > e1000_clean_rx_ring+0x10c/0x1a4 [ed4f1d60] [c01a76e0] > > e1000_clean_all_rx_rings+0x28/0x54 > > [ed4f1d70] [c01aac50] e1000_close+0x30/0xb4 [ed4f1d90] [c0226e2c] > > __dev_close_many+0xa0/0xe0 [ed4f1da0] [c0228c64] > __dev_close+0x2c/0x4c > > [ed4f1dc0] [c0225224] __dev_change_flags+0xb8/0x140 [ed4f1de0] > > [c0226d48] dev_change_flags+0x1c/0x60 [ed4f1e00] [c027e7f8] > > devinet_ioctl+0x2a4/0x700 [ed4f1e60] [c027f450] inet_ioctl+0xc8/0xfc > > [ed4f1e70] [c02147b0] sock_ioctl+0x260/0x2a0 [ed4f1e90] [c009b468] > > vfs_ioctl+0x2c/0x58 [ed4f1ea0] [c009bc44] do_vfs_ioctl+0x64c/0x6d4 > > [ed4f1f10] [c009bd24] sys_ioctl+0x58/0x88 [ed4f1f40] [c000e954] > > ret_from_syscall+0x0/0x38 > > --- Exception: c01 at 0xff35a3c > > LR = 0xff359a0 > > Instruction dump: > > 7c0802a6 3c80c007 3884a500 90010024 38a10008 38000000 90010008 > > 4bffff0d > > 80010024 38210020 7c0803a6 4e800020 > > <80030000> 7c691b78 700bc000 41a20008 > > Kernel panic - not syncing: Fatal exception Call Trace: > > [ed4f1b90] [c0007ccc] show_stack+0x58/0x154 (unreliable) [ed4f1bd0] > > [c001d744] panic+0xb0/0x1d8 [ed4f1c20] [c000b4b8] die+0x1ac/0x1d0 > > [ed4f1c40] [c0011e38] bad_page_fault+0xe8/0xfc [ed4f1c50] [c000edf4] > > handle_page_fault+0x7c/0x80 > > --- Exception: 300 at put_page+0x0/0x34 > > LR = skb_release_data+0x78/0xc8 > > [ed4f1d10] [00000000] (null) (unreliable) > > [ed4f1d20] [c021c914] __kfree_skb+0x18/0xbc [ed4f1d30] [c01a7620] > > e1000_clean_rx_ring+0x10c/0x1a4 [ed4f1d60] [c01a76e0] > > e1000_clean_all_rx_rings+0x28/0x54 > > [ed4f1d70] [c01aac50] e1000_close+0x30/0xb4 [ed4f1d90] [c0226e2c] > > __dev_close_many+0xa0/0xe0 [ed4f1da0] [c0228c64] > __dev_close+0x2c/0x4c > > [ed4f1dc0] [c0225224] __dev_change_flags+0xb8/0x140 [ed4f1de0] > > [c0226d48] dev_change_flags+0x1c/0x60 [ed4f1e00] [c027e7f8] > > devinet_ioctl+0x2a4/0x700 [ed4f1e60] [c027f450] inet_ioctl+0xc8/0xfc > > [ed4f1e70] [c02147b0] sock_ioctl+0x260/0x2a0 [ed4f1e90] [c009b468] > > vfs_ioctl+0x2c/0x58 [ed4f1ea0] [c009bc44] do_vfs_ioctl+0x64c/0x6d4 > > [ed4f1f10] [c009bd24] sys_ioctl+0x58/0x88 [ed4f1f40] [c000e954] > > ret_from_syscall+0x0/0x38 > > --- Exception: c01 at 0xff35a3c > > LR = 0xff359a0 > > > > When turning on SLAB checks, I see: > > > > Slab corruption: size-16384 start=ed4ec000, len=16384 > > 690: 6b 6b ff ff ff ff ff ff b8 ac 6f 99 bf 8b 08 00 > > 6a0: 45 00 00 24 3f 34 00 00 80 11 ca cf 0a ca 0d 33 > > 6b0: 0a ca 0d ff 06 cc 06 cf 00 10 bc 1d c5 0b 40 01 > > 6c0: 00 10 00 33 00 00 00 00 00 00 00 00 00 00 3f dd > > 6d0: ed f8 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b > > ea0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff Slab corruption: > > size-2048 start=ed4e6570, len=2048 > > Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. > > Last user: [< (null)>](0x0) > > 0c0: 6b 6b ff ff ff ff ff ff 5c 26 0a 41 81 27 08 00 > > 0d0: 45 00 00 4e 7d 44 00 00 80 11 8c 79 0a ca 0d 4f > > 0e0: 0a ca 0d ff 00 89 00 89 00 3a b5 a7 be 71 01 10 > > 0f0: 00 01 00 00 00 00 00 00 20 45 4c 45 43 45 50 46 > > 100: 49 43 41 43 41 43 41 43 41 43 41 43 41 43 41 43 > > 110: 41 43 41 43 41 43 41 41 41 00 00 20 00 01 02 5a Next obj: > > start=ed4e6d88, len=2048 > > Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. > > Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60) > > 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > > 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Slab corruption: > > size-2048 start=ed54eb48, len=2048 > > Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. > > Last user: [<c021cd1c>](skb_release_data+0xb4/0xc8) > > 020: 6b 6b ff ff ff ff ff ff 18 03 73 e4 64 18 08 00 > > 030: 45 00 00 44 61 b8 00 00 80 11 a7 c9 0a ca 0d 95 > > 040: 0a ca 0d ff f1 ee 07 9b 00 30 53 78 30 53 73 66 > > 050: 54 77 78 32 41 41 42 4b 52 55 5a 47 54 55 4e 51 > > 060: 51 7a 49 41 52 47 39 73 62 33 4a 54 61 58 52 42 > > 070: 62 57 55 41 bc fd 94 9f 6b 6b 6b 6b 6b 6b 6b 6b Prev obj: > > start=ed54e330, len=2048 > > Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. > > Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60) > > 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > > 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Next obj: > > start=ed54f360, len=2048 > > Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. > > Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60) > > 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > > 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Slab corruption: > > size-2048 start=ed4ae6f0, len=2048 > > Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. > > Last user: [<c021cd1c>](skb_release_data+0xb4/0xc8) > > 020: 6b 6b ff ff ff ff ff ff 00 1a e2 bd 06 44 81 00 > > 030: 00 f2 08 06 00 01 08 00 06 04 00 01 00 1a e2 bd > > 040: 06 44 0a f1 0a 4b 00 00 00 00 00 00 0a f1 0a 4b > > 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > 060: 00 00 37 ea e4 d5 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b Next obj: > > start=ed4aef08, len=2048 > > Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. > > Last user: [<c01f2898>](rx_submit+0xa0/0x174) > > 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > > 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Slab corruption: > > size-2048 start=ed792928, len=2048 > > Redzone: 0x9f911029d74e35b/0x9f911029d74e35b. > > Last user: [<c021cd1c>](skb_release_data+0xb4/0xc8) > > 020: 6b 6b ff ff ff ff ff ff d4 be d9 a0 1d 8a 08 00 > > 030: 45 00 00 44 1c a9 00 00 80 11 ed 4b 0a ca 0d 22 > > 040: 0a ca 0d ff e6 ed 07 9b 00 30 63 13 48 74 77 37 > > 050: 55 46 51 4c 41 41 42 51 53 45 6c 4d 51 55 46 54 > > 060: 52 56 42 44 4d 67 42 73 62 33 4a 54 61 58 52 42 > > 070: 62 57 55 41 d0 91 83 aa 6b 6b 6b 6b 6b 6b 6b 6b Prev obj: > > start=ed792110, len=2048 > > Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. > > Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60) > > 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > > 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a Next obj: > > start=ed793140, len=2048 > > Redzone: 0xd84156c5635688c0/0xd84156c5635688c0. > > Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60) > > 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > > 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a > > > > After some digging through the list archives and online searchs, I > > thougth perhaps there were DMA issues (such as the controller DMA'ing > > into memory after the rings are disabled). I turned on the DMA-API > > checks and see: > > > > e1000 0000:00:13.0: DMA-API: device driver tries to free DMA memory > it > > has not allocated [device address=0x000000004eae0000] [size=65535 > > bytes] ------------[ cut here ]------------ > > WARNING: at lib/dma-debug.c:811 > > Modules linked in: > > NIP: c01450d4 LR: c01450d4 CTR: c0169388 > > REGS: ed8fb7e0 TRAP: 0700 Not tainted (3.0.57-rt82) > > MSR: 00029032 <EE,ME,CE,IR,DR> CR: 24020482 XER: 20000000 TASK = > > ed8f6b60[3394] 'ip' THREAD: ed8fa000 > > GPR00: c01450d4 ed8fb890 ed8f6b60 00000093 00004718 ffffffff c01668a8 > > 00000000 > > GPR08: ed8f6b60 c03ca4c0 00004718 ed8f6c00 24020482 100456d0 c0302a48 > > c03029d4 > > GPR16: c03ee31c ed8fbac8 ed732f10 ed8fbab0 ed8fbad8 00000000 c02fda70 > > 00000000 > > GPR24: ed732f00 ed8fb8d8 00000001 0000ffff 4eae0000 ed8fb8d8 c050f6b8 > > 00000000 NIP [c01450d4] check_unmap+0x1e0/0x7b0 LR [c01450d4] > > check_unmap+0x1e0/0x7b0 Call Trace: > > [ed8fb890] [c01450d4] check_unmap+0x1e0/0x7b0 (unreliable) [ed8fb8d0] > > [c01457a4] debug_dma_unmap_page+0x7c/0x90 [ed8fb940] [c01ad344] > > e1000_unmap_and_free_tx_resource+0xf4/0x130 > > [ed8fb960] [c01ad3a8] e1000_clean_tx_ring+0x28/0xac [ed8fb980] > > [c01aecb8] e1000_down+0x1e4/0x210 [ed8fb9b0] [c01af488] > > e1000_close+0x30/0xb4 [ed8fb9d0] [c022c2f0] > __dev_close_many+0xa0/0xe0 > > [ed8fb9e0] [c022e128] __dev_close+0x2c/0x4c [ed8fba00] [c022a748] > > __dev_change_flags+0xb8/0x140 [ed8fba20] [c022c20c] > > dev_change_flags+0x1c/0x60 [ed8fba40] [c023bc40] > > do_setlink+0x278/0x748 [ed8fbaa0] [c023d044] rtnl_newlink+0x298/0x4b0 > > [ed8fbbd0] [c023c5f4] rtnetlink_rcv_msg+0x210/0x23c [ed8fbbf0] > > [c02454bc] netlink_rcv_skb+0x5c/0xd4 [ed8fbc10] [c023c3d0] > > rtnetlink_rcv+0x28/0x3c [ed8fbc30] [c0245200] > > netlink_unicast+0x244/0x2e4 [ed8fbc70] [c0246020] > > netlink_sendmsg+0x260/0x2dc [ed8fbcc0] [c021a29c] > > sock_sendmsg+0xa0/0xc4 [ed8fbda0] [c021b018] > __sys_sendmsg+0x1d8/0x29c > > [ed8fbeb0] [c021b220] sys_sendmsg+0x40/0x70 [ed8fbf00] [c021bfcc] > > sys_socketcall+0x1b0/0x230 [ed8fbf40] [c000e9b4] > > ret_from_syscall+0x0/0x38 > > --- Exception: c01 at 0xff0a9d4 > > LR = 0x10020900 > > Instruction dump: > > 48000014 80a9002c 2f850000 40be0008 80a90008 813d0020 3c60c035 > > 815d0024 3863f06c 80fd0018 811d001c 4bed9c01 <0fe00000> 3d20c040 > > 8009c360 2f800000 ---[ end trace 0000000000000002 ]--- > > ----------------------------------------------------------------------- > ------- > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and operations 2. > Dashboards that offer high-level views of enterprise services 3. A > single system of record for all IT processes > http://p.sf.net/sfu/servicenow-d2d-j > _______________________________________________ > E1000-devel mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/e1000-devel > To learn more about Intel® Ethernet, visit > http://communities.intel.com/community/wired ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ E1000-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired
