While obviously not supported or fully in mainline, we've been using the
latest conntrack development code and seeing a number of crashes that have
a very similar failure and the stack trace is below.  Curious to know if
this issue is fixed in master - it doesn't really seem to be a conntrack
specific issue but we'd like to get past this and continue to use and test
the conntrack stuff.

I can reproduce this two ways.  The first is to remove a linux bond
interface that is in a bridge using the "ip link del bond0" and then remove
the bridge via "ovs-vsctl del-br <br>".   That is the backtrace that is
below and I can reproduce this pretty quickly doing this behavior in a loop
(delete the bond, delete the bridge, wait, create-the bridge, create and
add the bond and another interface, repeat).     We weren't really trying
to be that forceful and I figured if the bridge remained up and we just
removed the bond from the bridge cleanly we may avoid the problem.  But
simply doing "ovs-vsctl del-port <br> bond0", "ip link del bond0" also
causes a similar problem but with much lower frequency.  (I will attach
this backtrace once I get it again if useful)

Before the crashes, we always see a message like this in ovs-vswitchd.log:

2015-11-12T01:29:03.945Z|01200|netdev_linux|WARN|Dropped 52 log messages in
last 74 seconds (most recently, 73 seconds ago) due to excessive rate
2015-11-12T01:29:03.945Z|01201|netdev_linux|WARN|dock0: removing policing
failed: Operation not supported
2015-11-12T01:29:03.945Z|01202|netdev_linux|WARN|br0: removing policing
failed: Operation not supported
2015-11-12T01:29:03.945Z|01203|netdev_linux|WARN|vi_l3_1: removing policing
failed: Operation not supported
2015-11-12T01:29:03.946Z|01204|netdev_linux|WARN|lan0: removing policing
failed: Operation not supported
2015-11-12T01:29:03.959Z|01205|bridge|WARN|could not open network device
bond0 (No such device)
2015-11-12T01:29:03.960Z|01206|netdev_linux|WARN|dock0: removing policing
failed: Operation not supported
2015-11-12T01:29:03.960Z|01207|netdev_linux|WARN|br0: removing policing
failed: Operation not supported
*2015-11-12T01:29:04.435Z|01208|dpif|WARN|system@ovs-system: failed to
flow_del (No such file or directory)
ufid:734136a4-5748-4691-a52c-cd5bfad4da68
recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=04:00:00:00:00:02,dst=04:00:00:00:00:fe),eth_type(0x0800),ipv4(src=192.168.27.2,dst=10.131.0.87,proto=6,tos=0,ttl=64,frag=no),tcp(src=51582,dst=443),tcp_flags(psh|ack)*
2015-11-12T01:29:04.435Z|01209|util|EMER|lib/cmap.c:846: assertion ok
failed in cmap_replace()

We are currently using the conntrack branch at this point:
https://github.com/justinpettit/ovs/commit/86e6bfcb999ed134aa12bf947cea4da1426af2c2
 .  We know this is about a month old but updated and didn't see an
improvement and ran into a few other issues.  Here is the back trace from
method #1:

Reading symbols from
/tmp/ovs-sym/usr/lib/debug/usr/sbin/ovs-vswitchd...done.
(gdb) back
#0  0x000000f37a6959dc in raise () from
/lib/mips64el-linux-gnuabi64/libc.so.6
#1  0x000000f37a697470 in abort () from
/lib/mips64el-linux-gnuabi64/libc.so.6
#2  0x000000012014cf18 in ovs_abort_valist (err_no=<optimized out>,
format=<optimized out>, args=<optimized out>) at lib/util.c:323
#3  0x0000000120156bac in vlog_abort_valist (module_=<optimized out>,
message=0x1201d3fe0 "%s: assertion %s failed in %s()",
    args=0xf57c5ad230) at lib/vlog.c:1129
#4  0x0000000120156bf4 in vlog_abort (module=<optimized out>,
message=<optimized out>) at lib/vlog.c:1143
#5  0x000000012014cb7c in ovs_assert_failure (where=<optimized out>,
function=<optimized out>, condition=<optimized out>)
    at lib/util.c:72
#6  0x00000001200919c8 in cmap_replace (cmap=<optimized out>,
old_node=<optimized out>, new_node=0x0, hash=<optimized out>)
    at lib/cmap.c:846
#7  0x0000000120066b88 in cmap_remove (hash=<optimized out>,
node=0xf2ec00af30, cmap=0x1209031a8) at ./lib/cmap.h:265
#8  ukey_delete (ukey=0xf2ec00af30, umap=0x120903178) at
ofproto/ofproto-dpif-upcall.c:1729
#9  push_ukey_ops (udpif=udpif@entry=0x12092ea40, umap=umap@entry=0x120903178,
ops=ops@entry=0xf57c5ad2e0, n_ops=n_ops@entry=1)
    at ofproto/ofproto-dpif-upcall.c:2046
#10 0x0000000120067f18 in revalidator_sweep__ (revalidator=<optimized out>,
purge=purge@entry=true)
    at ofproto/ofproto-dpif-upcall.c:2257
#11 0x0000000120068128 in revalidator_purge (revalidator=<optimized out>)
at ofproto/ofproto-dpif-upcall.c:2274
#12 udpif_stop_threads (udpif=udpif@entry=0x12092ea40) at
ofproto/ofproto-dpif-upcall.c:461
#13 0x0000000120068d50 in udpif_stop_threads (udpif=0x12092ea40) at
ofproto/ofproto-dpif-upcall.c:590
#14 udpif_synchronize (udpif=0x12092ea40) at
ofproto/ofproto-dpif-upcall.c:587
#15 0x0000000120054458 in destruct (ofproto_=0x1209ad410) at
ofproto/ofproto-dpif.c:1451
#16 0x00000001200473f0 in ofproto_destroy (p=0x1209ad410) at
ofproto/ofproto.c:1609
#17 0x000000012002c92c in bridge_destroy (br=br@entry=0x120962d30) at
vswitchd/bridge.c:3207
#18 0x000000012002ce88 in add_del_bridges (cfg=0x12092e460,
cfg=0x12092e460) at vswitchd/bridge.c:1713
#19 0x000000012002e404 in bridge_reconfigure
(ovs_cfg=ovs_cfg@entry=0x12092e460)
at vswitchd/bridge.c:597
#20 0x0000000120032c3c in bridge_run () at vswitchd/bridge.c:2973
#21 0x0000000120025a28 in main (argc=11, argv=0xf57c5af888) at
vswitchd/ovs-vswitchd.c:120

Again, if it helps I can get one from repro method #2.

Does anyone else see anything similar?  Or does anyone know of a fix in
master that may be relevant to this?

Thanks in advance for any time spent or help.
_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to