Hi Joe, > On Dec 16, 2015, at 4:19 PM, Joe Stringer <[email protected]> wrote: > > Hi Ben, > > Thanks for following up on this. Yes I think that all of the patches > we previously referred to are now merged in some form. > > Is the core dump/backtrace the same as before? Could you repost the > backtrace in this thread? > Here’s a current corefile backtrace:
----------------- [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/mips64el-linux-gnuabi64/libthread_db.so.1". Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfi'. Program terminated with signal SIGABRT, Aborted. #0 0x000000eeeaf629dc in raise () from /lib/mips64el-linux-gnuabi64/libc.so.6 (gdb) back #0 0x000000eeeaf629dc in raise () from /lib/mips64el-linux-gnuabi64/libc.so.6 #1 0x000000eeeaf64470 in abort () from /lib/mips64el-linux-gnuabi64/libc.so.6 #2 0x0000000120152a48 in ovs_abort_valist () #3 0x000000012015c82c in vlog_abort_valist () #4 0x000000012015c874 in vlog_abort () #5 0x00000001201526ac in ovs_assert_failure () #6 0x00000001200937f8 in cmap_replace () ----------------- And here’s what ovs-vswitchd.log spit out: ----------------- 2015-12-17T21:34:44.170Z|01044|bridge|INFO|bridge lan0: added interface lan0 on port 65534 2015-12-17T21:34:44.171Z|01045|bridge|INFO|bridge lan0: using datapath ID 0000dc3979807000 2015-12-17T21:34:44.172Z|01046|connmgr|INFO|lan0: added service controller "punix:/var/run/openvswitch/lan0.mgmt" 2015-12-17T21:34:44.175Z|01047|dpif|WARN|system@ovs-system: failed to flow_del (No such file or directory) ufid:b6c466f2-1771-4241-abf6-7ec5297a467b recirc_id(0),dp_hash(0),skb_priority(0),in_port(7),skb_mark(0x5),ct_state(0x2a),ct_zone(0),ct_mark(0x5),ct_label(0),eth(src=dc:39:79:80:71:60,dst=dc:39:79:80:71:08),eth_type(0x0800),ipv4(src=192.168.27.2,dst=10.0.15.1,proto=6,tos=0,ttl=63,frag=no),tcp(src=25252,dst=50363),tcp_flags(psh|ack) 2015-12-17T21:34:44.175Z|01048|util|EMER|lib/cmap.c:846: assertion ok failed in cmap_replace() 2015-12-17T21:34:44.205Z|00008|daemon_unix(monitor)|ERR|7 crashes: pid 21768 died, killed (Aborted), restarting 2015-12-17T21:34:44.215Z|00009|ovs_numa|INFO|Discovered 0 NUMA nodes and 0 CPU cores 2015-12-17T21:34:44.215Z|00010|memory|INFO|7584 kB peak resident set size after 10953.4 seconds 2015-12-17T21:34:44.215Z|00011|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connecting... 2015-12-17T21:34:44.215Z|00012|reconnect|INFO|unix:/var/run/openvswitch/db.sock: connected 2015-12-17T21:34:44.422Z|00013|ofproto_dpif|INFO|system@ovs-system: Datapath supports recirculation 2015-12-17T21:34:44.423Z|00014|ofproto_dpif|INFO|system@ovs-system: MPLS label stack length probed as 1 2015-12-17T21:34:44.423Z|00015|ofproto_dpif|INFO|system@ovs-system: Datapath supports unique flow ids 2015-12-17T21:34:44.423Z|00016|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_state 2015-12-17T21:34:44.423Z|00017|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_zone 2015-12-17T21:34:44.423Z|00018|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_mark 2015-12-17T21:34:44.424Z|00019|ofproto_dpif|INFO|system@ovs-system: Datapath supports ct_label 2015-12-17T21:34:44.435Z|00001|ofproto_dpif_upcall(handler1)|INFO|received packet on unassociated datapath port 0 ----------------- > Can you remind me what the port types are? Are bridges also being > added/removed? > This seems to be the last command issued before the crash (based on timestamp) ----------------- 2015-12-17T21:34:44.004905Z", "Level": "debug", "Message": "ran [ovs-vsctl -- --if-exists del-br lan0] ----------------- then this ----------------- 2015-12-17T21:34:44.489405Z", "Level": "debug", "Message": "ran [ovs-vsctl add-br lan0 -- set bridge lan0 other-config:hwaddr=dc:39:79:80:70:00\n] ----------------- Note: this is on a test bed where we’re continually creating and destroying the bridge, since it seems to accelerate the problem (we can make it happen within about 5 minutes). In normal operation, the bridge stays up and we just delete regular ports. In that case, the problem only shows itself a couple of times per day at most. > Jarno and I briefly spoke about this today, and one thought that came > up is whether the number of threads makes a difference here. Are you > also able to reproduce if you, for example, reduce the number of > revalidator/handler threads to 1? > > ovs-vsctl set Open_vSwitch . other_config:n-revalidator-threads=1 > ovs-vsctl set Open_vSwitch . other_config:n-handler-threads=1 > I tried these commands, unfortunately no improvement. > > On 16 December 2015 at 11:15, Ben Warren <[email protected]> wrote: >> Hi, >> >> We’re seeing this crash about a couple of times a day on our test bed, >> always when removing ports from a bridge (As Keith originally reported). Do >> you have any idea what might be happening? As I mentioned in my previous >> message, we’re pretty well at top of tree, so can very easily test any >> fixes. >> >> thanks, >> Ben >> >> >> On Dec 8, 2015, at 3:00 PM, Ben Warren <[email protected]> wrote: >> >> Hi Joe, >> >> Sorry for taking so long to get back to this. >> >> On Nov 23, 2015, at 6:54 PM, Joe Stringer <[email protected]> wrote: >> >> On 20 November 2015 at 10:05, Keith Holleman <[email protected]> >> wrote: >> >> >> Follow-up email here has the backtrace for the second method of >> reproduction. In this case the bridge is not deleted, it was using the loop >> logic of effectively these commands: >> >> >> <snip> >> >> Thanks a lot for the report! >> >> Would you be able to apply these two patches and see if they fix the >> issue you are observing? >> >> https://patchwork.ozlabs.org/patch/541190/ >> https://patchwork.ozlabs.org/patch/541191/ >> >> >> Now that your conntrack code has been committed, we decided to build off the >> “openvswitch/ovs” repo on Github. I built the top of the “branch-2.5” >> branch as of this morning: >> >> commit: >> https://github.com/openvswitch/ovs/commit/2862aeff82a3216ea4592c57299569484cf159ea >> >> and still see the crash. The patches listed above do not apply cleanly: it >> looks like much (although maybe not all?) of the logic is already committed. >> >> Here’s what I see in /var/log/ovsswitchd.log: >> >> 2015-12-08T22:19:38.770Z|01159|bridge|INFO|bridge lan0: using datapath ID >> 0000dc39790002b0 >> 2015-12-08T22:19:38.770Z|01160|connmgr|INFO|lan0: added service controller >> "punix:/var/run/openvswitch/lan0.mgmt" >> 2015-12-08T22:19:38.842Z|01161|dpif|WARN|system@ovs-system: failed to >> flow_del (No such file or directory) >> ufid:7580e732-908d-4134-9ca9-f6887195c2ae >> recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=04:00:00:00:00:02,dst=04:00:00:00:00:fe),eth_type(0x0800),ipv4(src=192.168.27.2,dst=192.168.27.254,proto=6,tos=0,ttl=64,frag=no),tcp(src=39055,dst=11111),tcp_flags(psh|ack) >> 2015-12-08T22:19:38.842Z|01162|util|EMER|lib/cmap.c:846: assertion ok failed >> in cmap_replace() >> 2015-12-08T22:19:39.175Z|00002|daemon_unix(monitor)|ERR|1 crashes: pid 978 >> died, killed (Aborted), core dumped, restarting >> >> System information: >> >> # ovs-ofctl --version >> ovs-ofctl (Open vSwitch) 2.5.0 >> Compiled Dec 8 2015 12:16:49 >> OpenFlow versions 0x1:0x4 >> >> # uname -a >> Linux cd25 3.10.20-rt14-copilot #1 SMP Tue Dec 8 12:11:17 PST 2015 mips64 >> GNU/Linux >> >> Please let us know what other information we can provide to help figure this >> out. >> >> regards, >> Ben >> >> _______________________________________________ >> discuss mailing list >> [email protected] >> http://openvswitch.org/mailman/listinfo/discuss >> >>
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ discuss mailing list [email protected] http://openvswitch.org/mailman/listinfo/discuss
