Hello,
I'm currently hitting a null pointer dereference and kernel panic that
seems to be in ovs. The problem is sporadic. I have one production machine
that's hit it four times in the past 24hrs, and one lab machine that I
can't get to hit it at all.
We rebuilt openvswitch with debugging symbols turned on, and traced the
null pointer dereference to datapath/linux/flow.c:814 . Do you have any
advice on how to trace this back to a root cause (or, ideally, a fix) ?
I've scoured Google for related issues but come up short. (I'll happily
accept that my google-fu is lacking, though.)
I would greatly appreciate any guidance you could offer. Here's some more
information about my system, for context.
All nodes have the following versions:
root@node:~# uname -a
Linux node 3.2.0-58-generic #88-Ubuntu SMP Tue Dec 3 17:37:58 UTC 2013
x86_64 x86_64 x86_64 GNU/Linux
root@node:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.4 LTS
Release: 12.04
Codename: precise
root@node:~# dpkg --list | grep openvswitch
ii openvswitch-common 1.10.2-0ubuntu2~cloud0
Open vSwitch common components
ii openvswitch-datapath-dkms 1.10.2-0ubuntu2~cloud0
Open vSwitch datapath module source - DKMS version
ii openvswitch-switch 1.10.2-0ubuntu2~cloud0
Open vSwitch switch implementations
root@node:~#
The stack trace from the console of a panic'd machine:
[259616.202845] Pid: 28568, comm: vhost-28567 Tainted: G WC O
3.2.0-58-generic #88-Ubuntu /0PXXHP$
[259616.213437] RIP: 0010:[<ffffffffa024ecb2>] [<ffffffffa024ecb2>]
ovs_flow_tbl_lookup+0xb2/0x100 [openvswitch]$
[259616.224611] RSP: 0018:ffff88180f243cb8 EFLAGS: 00010282$
[259616.230630] RAX: 0000000000000020 RBX: ffff880c72c407c0 RCX:
ffffffffffffffe0$
[259616.238678] RDX: ffff88010a0ba678 RSI: 0000000000000004 RDI:
ffff8807aa5ac000$
[259616.246728] RBP: ffff88180f243cf8 R08: 000000000000002c R09:
000000003fc3955c$
[259616.254776] R10: 0000000000000001 R11: 0000000000000001 R12:
000000000c2aa5f8$
[259616.262825] R13: 0000000000000018 R14: ffff88180f243d48 R15:
000000000000002c$
[259616.270877] FS: 0000000000000000(0000) GS:ffff88180f240000(0000)
knlGS:0000000000000000$
[259616.279990] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b$
[259616.286490] CR2: 0000000000000010 CR3: 0000000924208000 CR4:
00000000000426e0$
[259616.294539] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000$
[259616.302588] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7:
0000000000000400$
[259616.310640] Process vhost-28567 (pid: 28568, threadinfo
ffff8805e20e6000, task ffff880bed3a0000)$
[259616.320527] Stack:$
[259616.322866] ffff881206128c00 ffff880d00000044 ffff88180f243cf8
ffff880d2f61f100$
[259616.331256] ffffe8ffffa42188 ffff8817f0faf2c0 ffff881206128c00
ffff88180f254454$
[259616.339649] ffff88180f243dd8 ffffffffa024cf15 ffffffff8108f501
ffff88180f243d30$
[259616.348041] Call Trace:$
[259616.350863] <IRQ> $
[259616.353321] [<ffffffffa024cf15>]
ovs_dp_process_received_packet+0xc5/0x140 [openvswitch]$
[259616.362543] [<ffffffff8108f501>] ? hrtimer_forward+0x51/0xd0$
[259616.369057] [<ffffffffa025117c>] ovs_vport_receive+0x4c/0x50
[openvswitch]$
[259616.376926] [<ffffffffa0252203>] netdev_frame_hook+0xa3/0xf0
[openvswitch]$
[259616.384795] [<ffffffffa0252160>] ? netdev_create+0x110/0x110
[openvswitch]$
[259616.392660] [<ffffffff81546c60>] __netif_receive_skb+0x1d0/0x560$
[259616.399558] [<ffffffff81547411>] process_backlog+0xb1/0x190$
[259616.405973] [<ffffffff81548734>] net_rx_action+0x134/0x290$
[259616.412288] [<ffffffff8106fa08>] __do_softirq+0xa8/0x210$
[259616.418413] [<ffffffff8166c62c>] call_softirq+0x1c/0x30$
[259616.424429] <EOI> $
[259616.426883] [<ffffffff810162f5>] do_softirq+0x65/0xa0$
[259616.432714] [<ffffffff81548c08>] netif_rx_ni+0x28/0x30$
[259616.438643] [<ffffffff8147d89b>] tun_get_user+0x2fb/0x4a0$
[259616.444863] [<ffffffff8147da65>] tun_sendmsg+0x25/0x30$
[259616.450790] [<ffffffffa040f9d6>] handle_tx+0x296/0x520 [vhost_net]$
[259616.457880] [<ffffffffa040fc95>] handle_tx_kick+0x15/0x20 [vhost_net]$
[259616.465260] [<ffffffffa040ce4d>] vhost_worker+0xdd/0x170 [vhost_net]$
[259616.472543] [<ffffffffa040cd70>] ? vhost_set_memory+0x130/0x130
[vhost_net]$
[259616.480506] [<ffffffff8108b63c>] kthread+0x8c/0xa0$
[259616.486048] [<ffffffff8166c534>] kernel_thread_helper+0x4/0x10$
[259616.492752] [<ffffffff8108b5b0>] ? flush_kthread_worker+0xa0/0xa0$
[259616.499747] [<ffffffff8166c530>] ? gs_change+0x13/0x13$
[259616.505665] Code: 00 48 63 53 20 48 8d 42 01 48 c1 e0 04 48 01 c1 48 8b
01 48 85 c0 74 51 48 8b 09 48 c1 e2 04 48 83 c2 10 48 29 d1 48 85 c9 74 26
<44> 39 61 30 75 d0 4a 8d 7c 29 38 4c 89 fa 4c 89 f6 48 89 4d c8 $
[259616.527456] RIP [<ffffffffa024ecb2>] ovs_flow_tbl_lookup+0xb2/0x100
[openvswitch]$
[259616.536016] RSP <ffff88180f243cb8>$
[259616.540000] CR2: 0000000000000010$
[259616.544395] ---[ end trace 7cd7ddd24540f1d3 ]---$
[259616.549662] Kernel panic - not syncing: Fatal exception in interrupt$
[259616.556849] Pid: 28568, comm: vhost-28567 Tainted: G D WC O
3.2.0-58-generic #88-Ubuntu$
[259616.566357] Call Trace:$
[259616.569183] <IRQ> [<ffffffff81649285>] panic+0x91/0x1a4$
[259616.575345] [<ffffffff81662f5a>] oops_end+0xea/0xf0$
[259616.580994] [<ffffffff8164812f>] no_context+0x150/0x15d$
[259616.587028] [<ffffffff81648307>] __bad_area_nosemaphore+0x1cb/0x1ea$
[259616.594227] [<ffffffff811645eb>] ? kfree+0x3b/0x140$
[259616.599873] [<ffffffff810e234e>] ? rcu_irq_exit+0xe/0x10$
[259616.606009] [<ffffffff81648339>] bad_area_nosemaphore+0x13/0x15$
[259616.612820] [<ffffffff81665bab>] do_page_fault+0x46b/0x540$
[259616.619143] [<ffffffff8153a455>] ? kfree_skb+0x45/0xc0$
[259616.625082] [<ffffffff81571479>] ? netlink_attachskb+0x1d9/0x220$
[259616.631989] [<ffffffff810608e0>] ? try_to_wake_up+0x200/0x200$
[259616.638608] [<ffffffff816624f5>] page_fault+0x25/0x30$
[259616.644456] [<ffffffffa024ecb2>] ? ovs_flow_tbl_lookup+0xb2/0x100
[openvswitch]$
[259616.652817] [<ffffffffa024ec5a>] ? ovs_flow_tbl_lookup+0x5a/0x100
[openvswitch]$
[259616.661180] [<ffffffffa024cf15>]
ovs_dp_process_received_packet+0xc5/0x140 [openvswitch]$
[259616.670410] [<ffffffff8108f501>] ? hrtimer_forward+0x51/0xd0$
[259616.676936] [<ffffffffa025117c>] ovs_vport_receive+0x4c/0x50
[openvswitch]$
[259616.684813] [<ffffffffa0252203>] netdev_frame_hook+0xa3/0xf0
[openvswitch]$
[259616.692696] [<ffffffffa0252160>] ? netdev_create+0x110/0x110
[openvswitch]$
[259616.700575] [<ffffffff81546c60>] __netif_receive_skb+0x1d0/0x560$
[259616.707482] [<ffffffff81547411>] process_backlog+0xb1/0x190$
[259616.713915] [<ffffffff81548734>] net_rx_action+0x134/0x290$
[259616.720242] [<ffffffff8106fa08>] __do_softirq+0xa8/0x210$
[259616.726384] [<ffffffff8166c62c>] call_softirq+0x1c/0x30$
[259616.732410] <EOI> [<ffffffff810162f5>] do_softirq+0x65/0xa0$
--
/thor
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev