I'm seeing a kernel oops while doing some OVN testing. > [69503.759887] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000048 > [69503.759905] IP: [<ffffffffa0397915>] ovs_lookup_vport+0x5/0x60 > [openvswitch] > [69503.759915] PGD 11bc28067 PUD 139bc6067 PMD 0 > [69503.759921] Oops: 0000 [#1] SMP > [69503.759926] Modules linked in: xt_nat xt_mark xt_REDIRECT nf_nat_redirect > xt_CHECKSUM xt_comment openvswitch libcrc32c ip6t_rpfilter ip6t_REJECT > nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc > ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 > nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter > ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat > nf_conntrack iptable_mangle iptable_security iptable_raw > snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec > snd_hwdep snd_seq snd_seq_device iosf_mbi snd_pcm crct10dif_pclmul > crc32_pclmul crc32c_intel ppdev ghash_clmulni_intel snd_timer parport_pc > serio_raw virtio_console snd virtio_balloon pvpanic parport soundcore > i2c_piix4 nfsd auth_rpcgss nfs_acl lockd > [69503.760020] grace sunrpc virtio_net virtio_blk qxl drm_kms_helper ttm drm > virtio_pci virtio_ring virtio ata_generic pata_acpi > [69503.760020] CPU: 0 PID: 18288 Comm: ovs-vswitchd Not tainted > 3.19.1-201.fc21.x86_64 #1 > [69503.760020] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS > 1.7.5-20140709_153950- 04/01/2014 > [69503.760020] task: ffff8800bb734d80 ti: ffff880139cec000 task.ti: > ffff880139cec000 > [69503.760020] RIP: 0010:[<ffffffffa0397915>] [<ffffffffa0397915>] > ovs_lookup_vport+0x5/0x60 [openvswitch] > [69503.760020] RSP: 0018:ffff880139cef960 EFLAGS: 00010246 > [69503.760020] RAX: ffff88011bc34058 RBX: ffff88011bc34058 RCX: > 0000000000000000 > [69503.760020] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 0000000000000000 > [69503.760731] RBP: ffff880139cef9d8 R08: 0000000000000008 R09: > ffff88011bc3405c > [69503.760731] R10: 0000000000004770 R11: 0000000000003e7c R12: > ffff8800bb211f00 > [69503.760731] R13: ffff880036a67000 R14: 0000000000000000 R15: > ffff88008eb7b310 > [69503.760731] FS: 00007fa06644fa40(0000) GS:ffff88013fc00000(0000) > knlGS:0000000000000000 > [69503.760731] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [69503.760731] CR2: 0000000000000048 CR3: 000000012b64b000 CR4: > 00000000000406f0 > [69503.760731] Stack: > [69503.760731] ffffffffa0398383 0000000200000002 000000000000000a > 000000000000000d > [69503.760731] 00000000000002dc 0000000000000432 0000000000000000 > 0000000000000000 > [69503.760731] 0000000000000000 0000000000000000 000000009d414cae > ffff880036a67000 > [69503.760731] Call Trace: > [69503.760731] [<ffffffffa0398383>] ? ovs_vport_cmd_fill_info+0x53/0x1b0 > [openvswitch] > [69503.760731] [<ffffffffa039859c>] ovs_vport_cmd_dump+0xbc/0x120 > [openvswitch] > [69503.760731] [<ffffffff8168acfa>] netlink_dump+0x11a/0x2d0 > [69503.760731] [<ffffffff8168b633>] __netlink_dump_start+0x193/0x1d0 > [69503.760731] [<ffffffff8168e1d0>] ? genl_family_rcv_msg+0x3e0/0x3e0 > [69503.760731] [<ffffffff8168e1ad>] genl_family_rcv_msg+0x3bd/0x3e0 > [69503.760731] [<ffffffffa03984e0>] ? ovs_vport_cmd_fill_info+0x1b0/0x1b0 > [openvswitch] > [69503.760731] [<ffffffff811fe5c9>] ? __kmalloc_node_track_caller+0x259/0x320 > [69503.760731] [<ffffffff813aa616>] ? rhashtable_lookup_compare+0x36/0x70 > [69503.760731] [<ffffffff8168e1d0>] ? genl_family_rcv_msg+0x3e0/0x3e0 > [69503.760731] [<ffffffff8168e249>] genl_rcv_msg+0x79/0xc0 > [69503.760731] [<ffffffff8168d6c9>] netlink_rcv_skb+0xb9/0xe0 > [69503.760731] [<ffffffff8168dddc>] genl_rcv+0x2c/0x40 > [69503.760731] [<ffffffff8168cddd>] netlink_unicast+0x12d/0x1c0 > [69503.760731] [<ffffffff8168d197>] netlink_sendmsg+0x327/0x680 > [69503.760731] [<ffffffff8163dc8c>] do_sock_sendmsg+0x9c/0x110 > [69503.760731] [<ffffffff81063bca>] ? __do_page_fault+0x21a/0x5b0 > [69503.760731] [<ffffffff81238175>] ? __fget_light+0x25/0x70 > [69503.760731] [<ffffffff8163debb>] SYSC_sendto+0x12b/0x1d0 > [69503.760731] [<ffffffff8163ed45>] ? __sys_recvmsg+0x85/0x90 > [69503.760731] [<ffffffff8163e74e>] SyS_sendto+0xe/0x10 > [69503.760731] [<ffffffff81774029>] system_call_fastpath+0x12/0x17 > [69503.760731] Code: 84 00 00 00 00 00 66 66 66 66 90 55 48 c7 c7 e0 50 3a a0 > 48 89 e5 e8 ab a4 3d e1 5d c3 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> > 8b 47 48 89 f2 81 e6 ff 03 00 00 55 48 8d 04 f0 48 89 e5 48 > [69503.760731] RIP [<ffffffffa0397915>] ovs_lookup_vport+0x5/0x60 > [openvswitch] > [69503.760731] RSP <ffff880139cef960> > [69503.760731] CR2: 0000000000000048 > [69504.248932] ---[ end trace ee300d6bca7ba796 ]---
This is the openvswitch module that came with the following Fedora kernel: 3.19.1-201.fc21.x86_64 I can easily reproduce this. It happens when running devstack multiple times to stand up OpenStack + OVS + OVN + OpenStack Neutron OVN integration. So far, it seems to consistently break the 2nd time I run devstack after a reboot. The relevant devstack code is here: http://git.openstack.org/cgit/stackforge/networking-ovn/tree/devstack/plugin.sh In particular, take a look at init_ovn, install_ovn, start_ovn, and stop_ovn. When the VM gets in a bad state, ovs-vswitchd exits fairly quickly on startup when the oops occurs. If I follow it with gdb, I get: > 482 retval = send(sock->fd, msg->data, msg->size, > (gdb) bt > #0 nl_sock_send__ (sock=0x7e2390, msg=msg@entry=0x7c8090, nlmsg_seq=5, > wait=wait@entry=true) at lib/netlink-socket.c:482 > #1 0x00000000004f8105 in nl_dump_start (dump=dump@entry=0x7dcf60, > protocol=protocol@entry=16, request=request@entry=0x7c8090) at > lib/netlink-socket.c:977 > #2 0x00000000004ed535 in dpif_netlink_port_dump_start__ > (dump=dump@entry=0x7dcf60, dpif=<optimized out>) at lib/dpif-netlink.c:1081 > #3 0x00000000004ed590 in dpif_netlink_port_dump_start (dpif_=0x7e2a50, > statep=0x7fffffffdc80) at lib/dpif-netlink.c:1092 > #4 0x000000000045b10d in dpif_port_dump_start > (dump=dump@entry=0x7fffffffdc70, dpif=0x7e2a50) at lib/dpif.c:718 > #5 0x0000000000425884 in open_dpif_backer (type=0x7e1f20 "system", > backerp=backerp@entry=0x7ee738) at ofproto/ofproto-dpif.c:956 > #6 0x000000000042a7b4 in construct (ofproto_=0x7ee4b0) at > ofproto/ofproto-dpif.c:1241 > #7 0x000000000041cfd4 in ofproto_create (datapath_name=0x7c1460 "br-ex", > datapath_type=<optimized out>, ofprotop=ofprotop@entry=0x7eb7a8) at > ofproto/ofproto.c:535 > #8 0x000000000040dd9c in bridge_reconfigure (ovs_cfg=ovs_cfg@entry=0x7f4670) > at vswitchd/bridge.c:629 > #9 0x000000000040edb3 in bridge_run () at vswitchd/bridge.c:2961 > #10 0x00000000004057cd in main (argc=2, argv=0x7fffffffe548) at > vswitchd/ovs-vswitchd.c:116 > (gdb) next > > Program terminated with signal SIGKILL, Killed. -- Russell Bryant _______________________________________________ dev mailing list [email protected] http://openvswitch.org/mailman/listinfo/dev
