After some more digging, I'm wondering if this is indeed a timing
issue.  Is there a problem with bringing up an interface too soon
after taking it down?  If I change my loop to use a 30 second delay
between interface bringup/teardown, I don't get the panic.

It appears that upon a change in adapter state, the netlink interface
returns nearly immediately.  I wrote some test code to change IFF_UP
via netlink, and timed how long it took to get the flag to actually
change (i.e. sending the request, waiting for the ack, and updating
the cache to indicatge the flag was changed).  I noted that from down
to up was about 21us.  And up to down took about 38us.  And even with
this test code, setting the delay between up/down transitions too
small (less than 30 seconds) I would get the panic.

So.  It seems to me there is some issue with the up/down transition.
Any insight would be very helpful.

Thanks,
Pete

On Wed, Jun 5, 2013 at 9:03 AM, Peter LaDow <[email protected]> wrote:
> We are running a PPC system with an 82540EP that is causing kernel
> panics when there is heavy traffic and the interface is brought  up
> and/or down (we aren't sure which yet).
>
> We are running 3.0.57-rt82, but we can re-create this issue reliably
> with 3.0.80 and 3.0.80-rt109 with the base version included in the
> kernel (which is 7.3.21-k8-NAPI).  However, I've also tried 8.0.35,
> and get the same failure.
>
> We've narrowed it down to this case and can reliably re-create the
> issue with a tight loop, such as:
>
> while :
> do
>   ip link set eth2 up
>   sleep 10
>   ip link set eth2 down
>   sleep 10
> done
>
> I'm not sure where to look and any help would be appreciated.
>
> In this loop we can reliably generate a kernel panic such as:
>
> Unable to handle kernel paging request for data at address 0x20454a46
> Faulting instruction address: 0xc0069924
> Oops: Kernel access of bad area, sig: 11 [#1]
> PREEMPT PPC Platform
> Modules linked in:
> NIP: c0069924 LR: c021cce0 CTR: c000cecc
> REGS: ed4f1c60 TRAP: 0300   Not tainted  (3.0.80-rt108)
> MSR: 00009032 <EE,ME,IR,DR>  CR: 24008248  XER: 00000000
> DAR: 20454a46, DSISR: 20000000
> TASK = eda46780[3106] 'ifconfig' THREAD: ed4f0000
> GPR00: 00000000 ed4f1d10 eda46780 20454a46 2d6fcc2a 000005f2 00000002 00000000
> GPR08: eda46780 ed6fd228 ed4f1cd0 000090b1 00000000 10084718 bfcceaec 10062044
> GPR16: 10062120 bfcceadc 00000000 bfcceac4 00000228 00000000 00008914 c01ac398
> GPR24: c01ac8c8 ed066520 00000061 ed0663a0 ef0448f0 00000000 00000001 ed575580
> NIP [c0069924] put_page+0x0/0x34
> LR [c021cce0] skb_release_data+0x78/0xc8
> Call Trace:
> [ed4f1d20] [c021c914] __kfree_skb+0x18/0xbc
> [ed4f1d30] [c01a7620] e1000_clean_rx_ring+0x10c/0x1a4
> [ed4f1d60] [c01a76e0] e1000_clean_all_rx_rings+0x28/0x54
> [ed4f1d70] [c01aac50] e1000_close+0x30/0xb4
> [ed4f1d90] [c0226e2c] __dev_close_many+0xa0/0xe0
> [ed4f1da0] [c0228c64] __dev_close+0x2c/0x4c
> [ed4f1dc0] [c0225224] __dev_change_flags+0xb8/0x140
> [ed4f1de0] [c0226d48] dev_change_flags+0x1c/0x60
> [ed4f1e00] [c027e7f8] devinet_ioctl+0x2a4/0x700
> [ed4f1e60] [c027f450] inet_ioctl+0xc8/0xfc
> [ed4f1e70] [c02147b0] sock_ioctl+0x260/0x2a0
> [ed4f1e90] [c009b468] vfs_ioctl+0x2c/0x58
> [ed4f1ea0] [c009bc44] do_vfs_ioctl+0x64c/0x6d4
> [ed4f1f10] [c009bd24] sys_ioctl+0x58/0x88
> [ed4f1f40] [c000e954] ret_from_syscall+0x0/0x38
> --- Exception: c01 at 0xff35a3c
>     LR = 0xff359a0
> Instruction dump:
> 7c0802a6 3c80c007 3884a500 90010024 38a10008 38000000 90010008 4bffff0d
> 80010024 38210020 7c0803a6 4e800020
> <80030000> 7c691b78 700bc000 41a20008
> Kernel panic - not syncing: Fatal exception
> Call Trace:
> [ed4f1b90] [c0007ccc] show_stack+0x58/0x154 (unreliable)
> [ed4f1bd0] [c001d744] panic+0xb0/0x1d8
> [ed4f1c20] [c000b4b8] die+0x1ac/0x1d0
> [ed4f1c40] [c0011e38] bad_page_fault+0xe8/0xfc
> [ed4f1c50] [c000edf4] handle_page_fault+0x7c/0x80
> --- Exception: 300 at put_page+0x0/0x34
>     LR = skb_release_data+0x78/0xc8
> [ed4f1d10] [00000000]   (null) (unreliable)
> [ed4f1d20] [c021c914] __kfree_skb+0x18/0xbc
> [ed4f1d30] [c01a7620] e1000_clean_rx_ring+0x10c/0x1a4
> [ed4f1d60] [c01a76e0] e1000_clean_all_rx_rings+0x28/0x54
> [ed4f1d70] [c01aac50] e1000_close+0x30/0xb4
> [ed4f1d90] [c0226e2c] __dev_close_many+0xa0/0xe0
> [ed4f1da0] [c0228c64] __dev_close+0x2c/0x4c
> [ed4f1dc0] [c0225224] __dev_change_flags+0xb8/0x140
> [ed4f1de0] [c0226d48] dev_change_flags+0x1c/0x60
> [ed4f1e00] [c027e7f8] devinet_ioctl+0x2a4/0x700
> [ed4f1e60] [c027f450] inet_ioctl+0xc8/0xfc
> [ed4f1e70] [c02147b0] sock_ioctl+0x260/0x2a0
> [ed4f1e90] [c009b468] vfs_ioctl+0x2c/0x58
> [ed4f1ea0] [c009bc44] do_vfs_ioctl+0x64c/0x6d4
> [ed4f1f10] [c009bd24] sys_ioctl+0x58/0x88
> [ed4f1f40] [c000e954] ret_from_syscall+0x0/0x38
> --- Exception: c01 at 0xff35a3c
>     LR = 0xff359a0
>
> When turning on SLAB checks, I see:
>
> Slab corruption: size-16384 start=ed4ec000, len=16384
> 690: 6b 6b ff ff ff ff ff ff b8 ac 6f 99 bf 8b 08 00
> 6a0: 45 00 00 24 3f 34 00 00 80 11 ca cf 0a ca 0d 33
> 6b0: 0a ca 0d ff 06 cc 06 cf 00 10 bc 1d c5 0b 40 01
> 6c0: 00 10 00 33 00 00 00 00 00 00 00 00 00 00 3f dd
> 6d0: ed f8 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> ea0: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff
> Slab corruption: size-2048 start=ed4e6570, len=2048
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<  (null)>](0x0)
> 0c0: 6b 6b ff ff ff ff ff ff 5c 26 0a 41 81 27 08 00
> 0d0: 45 00 00 4e 7d 44 00 00 80 11 8c 79 0a ca 0d 4f
> 0e0: 0a ca 0d ff 00 89 00 89 00 3a b5 a7 be 71 01 10
> 0f0: 00 01 00 00 00 00 00 00 20 45 4c 45 43 45 50 46
> 100: 49 43 41 43 41 43 41 43 41 43 41 43 41 43 41 43
> 110: 41 43 41 43 41 43 41 41 41 00 00 20 00 01 02 5a
> Next obj: start=ed4e6d88, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60)
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> Slab corruption: size-2048 start=ed54eb48, len=2048
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<c021cd1c>](skb_release_data+0xb4/0xc8)
> 020: 6b 6b ff ff ff ff ff ff 18 03 73 e4 64 18 08 00
> 030: 45 00 00 44 61 b8 00 00 80 11 a7 c9 0a ca 0d 95
> 040: 0a ca 0d ff f1 ee 07 9b 00 30 53 78 30 53 73 66
> 050: 54 77 78 32 41 41 42 4b 52 55 5a 47 54 55 4e 51
> 060: 51 7a 49 41 52 47 39 73 62 33 4a 54 61 58 52 42
> 070: 62 57 55 41 bc fd 94 9f 6b 6b 6b 6b 6b 6b 6b 6b
> Prev obj: start=ed54e330, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60)
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> Next obj: start=ed54f360, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60)
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> Slab corruption: size-2048 start=ed4ae6f0, len=2048
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<c021cd1c>](skb_release_data+0xb4/0xc8)
> 020: 6b 6b ff ff ff ff ff ff 00 1a e2 bd 06 44 81 00
> 030: 00 f2 08 06 00 01 08 00 06 04 00 01 00 1a e2 bd
> 040: 06 44 0a f1 0a 4b 00 00 00 00 00 00 0a f1 0a 4b
> 050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 060: 00 00 37 ea e4 d5 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
> Next obj: start=ed4aef08, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<c01f2898>](rx_submit+0xa0/0x174)
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> Slab corruption: size-2048 start=ed792928, len=2048
> Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
> Last user: [<c021cd1c>](skb_release_data+0xb4/0xc8)
> 020: 6b 6b ff ff ff ff ff ff d4 be d9 a0 1d 8a 08 00
> 030: 45 00 00 44 1c a9 00 00 80 11 ed 4b 0a ca 0d 22
> 040: 0a ca 0d ff e6 ed 07 9b 00 30 63 13 48 74 77 37
> 050: 55 46 51 4c 41 41 42 51 53 45 6c 4d 51 55 46 54
> 060: 52 56 42 44 4d 67 42 73 62 33 4a 54 61 58 52 42
> 070: 62 57 55 41 d0 91 83 aa 6b 6b 6b 6b 6b 6b 6b 6b
> Prev obj: start=ed792110, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60)
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> Next obj: start=ed793140, len=2048
> Redzone: 0xd84156c5635688c0/0xd84156c5635688c0.
> Last user: [<c021e294>](__netdev_alloc_skb+0x28/0x60)
> 000: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
> 010: 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
>
> After some digging through the list archives and online searchs, I
> thougth perhaps there were DMA issues (such as the controller DMA'ing
> into memory after the rings are disabled).  I turned on the DMA-API
> checks and see:
>
> e1000 0000:00:13.0: DMA-API: device driver tries to free DMA memory it
> has not allocated [device address=0x000000004eae0000] [size=65535
> bytes]
> ------------[ cut here ]------------
> WARNING: at lib/dma-debug.c:811
> Modules linked in:
> NIP: c01450d4 LR: c01450d4 CTR: c0169388
> REGS: ed8fb7e0 TRAP: 0700   Not tainted  (3.0.57-rt82)
> MSR: 00029032 <EE,ME,CE,IR,DR>  CR: 24020482  XER: 20000000
> TASK = ed8f6b60[3394] 'ip' THREAD: ed8fa000
> GPR00: c01450d4 ed8fb890 ed8f6b60 00000093 00004718 ffffffff c01668a8 00000000
> GPR08: ed8f6b60 c03ca4c0 00004718 ed8f6c00 24020482 100456d0 c0302a48 c03029d4
> GPR16: c03ee31c ed8fbac8 ed732f10 ed8fbab0 ed8fbad8 00000000 c02fda70 00000000
> GPR24: ed732f00 ed8fb8d8 00000001 0000ffff 4eae0000 ed8fb8d8 c050f6b8 00000000
> NIP [c01450d4] check_unmap+0x1e0/0x7b0
> LR [c01450d4] check_unmap+0x1e0/0x7b0
> Call Trace:
> [ed8fb890] [c01450d4] check_unmap+0x1e0/0x7b0 (unreliable)
> [ed8fb8d0] [c01457a4] debug_dma_unmap_page+0x7c/0x90
> [ed8fb940] [c01ad344] e1000_unmap_and_free_tx_resource+0xf4/0x130
> [ed8fb960] [c01ad3a8] e1000_clean_tx_ring+0x28/0xac
> [ed8fb980] [c01aecb8] e1000_down+0x1e4/0x210
> [ed8fb9b0] [c01af488] e1000_close+0x30/0xb4
> [ed8fb9d0] [c022c2f0] __dev_close_many+0xa0/0xe0
> [ed8fb9e0] [c022e128] __dev_close+0x2c/0x4c
> [ed8fba00] [c022a748] __dev_change_flags+0xb8/0x140
> [ed8fba20] [c022c20c] dev_change_flags+0x1c/0x60
> [ed8fba40] [c023bc40] do_setlink+0x278/0x748
> [ed8fbaa0] [c023d044] rtnl_newlink+0x298/0x4b0
> [ed8fbbd0] [c023c5f4] rtnetlink_rcv_msg+0x210/0x23c
> [ed8fbbf0] [c02454bc] netlink_rcv_skb+0x5c/0xd4
> [ed8fbc10] [c023c3d0] rtnetlink_rcv+0x28/0x3c
> [ed8fbc30] [c0245200] netlink_unicast+0x244/0x2e4
> [ed8fbc70] [c0246020] netlink_sendmsg+0x260/0x2dc
> [ed8fbcc0] [c021a29c] sock_sendmsg+0xa0/0xc4
> [ed8fbda0] [c021b018] __sys_sendmsg+0x1d8/0x29c
> [ed8fbeb0] [c021b220] sys_sendmsg+0x40/0x70
> [ed8fbf00] [c021bfcc] sys_socketcall+0x1b0/0x230
> [ed8fbf40] [c000e9b4] ret_from_syscall+0x0/0x38
> --- Exception: c01 at 0xff0a9d4
>     LR = 0x10020900
> Instruction dump:
> 48000014 80a9002c 2f850000 40be0008 80a90008 813d0020 3c60c035 815d0024
> 3863f06c 80fd0018 811d001c 4bed9c01 <0fe00000> 3d20c040 8009c360 2f800000
> ---[ end trace 0000000000000002 ]---

------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to