Hi Joe,

> On Dec 16, 2015, at 4:19 PM, Joe Stringer <[email protected]> wrote:
> 
> Hi Ben,
> 
> Thanks for following up on this. Yes I think that all of the patches
> we previously referred to are now merged in some form.
> 
> Is the core dump/backtrace the same as before? Could you repost the
> backtrace in this thread?
> 
Here’s a current corefile backtrace:

-----------------
[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/mips64el-linux-gnuabi64/libthread_db.so.1".
Core was generated by `ovs-vswitchd unix:/var/run/openvswitch/db.sock 
-vconsole:emer -vsyslog:err -vfi'.
Program terminated with signal SIGABRT, Aborted.
#0  0x000000eeeaf629dc in raise () from /lib/mips64el-linux-gnuabi64/libc.so.6
(gdb) back
#0  0x000000eeeaf629dc in raise () from /lib/mips64el-linux-gnuabi64/libc.so.6
#1  0x000000eeeaf64470 in abort () from /lib/mips64el-linux-gnuabi64/libc.so.6
#2  0x0000000120152a48 in ovs_abort_valist ()
#3  0x000000012015c82c in vlog_abort_valist ()
#4  0x000000012015c874 in vlog_abort ()
#5  0x00000001201526ac in ovs_assert_failure ()
#6  0x00000001200937f8 in cmap_replace ()
-----------------

And here’s what ovs-vswitchd.log spit out:

-----------------
2015-12-17T21:34:44.170Z|01044|bridge|INFO|bridge lan0: added interface lan0 on 
port 65534
2015-12-17T21:34:44.171Z|01045|bridge|INFO|bridge lan0: using datapath ID 
0000dc3979807000
2015-12-17T21:34:44.172Z|01046|connmgr|INFO|lan0: added service controller 
"punix:/var/run/openvswitch/lan0.mgmt"
2015-12-17T21:34:44.175Z|01047|dpif|WARN|system@ovs-system: failed to flow_del 
(No such file or directory) ufid:b6c466f2-1771-4241-abf6-7ec5297a467b 
recirc_id(0),dp_hash(0),skb_priority(0),in_port(7),skb_mark(0x5),ct_state(0x2a),ct_zone(0),ct_mark(0x5),ct_label(0),eth(src=dc:39:79:80:71:60,dst=dc:39:79:80:71:08),eth_type(0x0800),ipv4(src=192.168.27.2,dst=10.0.15.1,proto=6,tos=0,ttl=63,frag=no),tcp(src=25252,dst=50363),tcp_flags(psh|ack)
2015-12-17T21:34:44.175Z|01048|util|EMER|lib/cmap.c:846: assertion ok failed in 
cmap_replace()
2015-12-17T21:34:44.205Z|00008|daemon_unix(monitor)|ERR|7 crashes: pid 21768 
died, killed (Aborted), restarting
2015-12-17T21:34:44.215Z|00009|ovs_numa|INFO|Discovered 0 NUMA nodes and 0 CPU 
cores
2015-12-17T21:34:44.215Z|00010|memory|INFO|7584 kB peak resident set size after 
10953.4 seconds
2015-12-17T21:34:44.215Z|00011|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
 connecting...
2015-12-17T21:34:44.215Z|00012|reconnect|INFO|unix:/var/run/openvswitch/db.sock:
 connected
2015-12-17T21:34:44.422Z|00013|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports recirculation
2015-12-17T21:34:44.423Z|00014|ofproto_dpif|INFO|system@ovs-system: MPLS label 
stack length probed as 1
2015-12-17T21:34:44.423Z|00015|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports unique flow ids
2015-12-17T21:34:44.423Z|00016|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports ct_state
2015-12-17T21:34:44.423Z|00017|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports ct_zone
2015-12-17T21:34:44.423Z|00018|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports ct_mark
2015-12-17T21:34:44.424Z|00019|ofproto_dpif|INFO|system@ovs-system: Datapath 
supports ct_label
2015-12-17T21:34:44.435Z|00001|ofproto_dpif_upcall(handler1)|INFO|received 
packet on unassociated datapath port 0
-----------------

> Can you remind me what the port types are? Are bridges also being 
> added/removed?
> 

This seems to be the last command issued before the crash (based on timestamp)

-----------------
2015-12-17T21:34:44.004905Z", "Level": "debug", "Message": "ran [ovs-vsctl -- 
--if-exists del-br lan0]
-----------------

then this

-----------------
2015-12-17T21:34:44.489405Z", "Level": "debug", "Message": "ran [ovs-vsctl 
add-br lan0 -- set bridge lan0 other-config:hwaddr=dc:39:79:80:70:00\n]
-----------------

Note: this is on a test bed where we’re continually creating and destroying the 
bridge, since it seems to accelerate the problem (we can make it happen within 
about 5 minutes).  In normal operation, the bridge stays up and we just delete 
regular ports.  In that case, the problem only shows itself a couple of times 
per day at most.

> Jarno and I briefly spoke about this today, and one thought that came
> up is whether the number of threads makes a difference here. Are you
> also able to reproduce if you, for example, reduce the number of
> revalidator/handler threads to 1?
> 
> ovs-vsctl set Open_vSwitch . other_config:n-revalidator-threads=1
> ovs-vsctl set Open_vSwitch . other_config:n-handler-threads=1
> 
I tried these commands, unfortunately no improvement.
> 
> On 16 December 2015 at 11:15, Ben Warren <[email protected]> wrote:
>> Hi,
>> 
>> We’re seeing this crash about a couple of times a day on our test bed,
>> always when removing ports from a bridge (As Keith originally reported).  Do
>> you have any idea what might be happening?  As I mentioned in my previous
>> message, we’re pretty well at top of tree, so can very easily test any
>> fixes.
>> 
>> thanks,
>> Ben
>> 
>> 
>> On Dec 8, 2015, at 3:00 PM, Ben Warren <[email protected]> wrote:
>> 
>> Hi Joe,
>> 
>> Sorry for taking so long to get back to this.
>> 
>> On Nov 23, 2015, at 6:54 PM, Joe Stringer <[email protected]> wrote:
>> 
>> On 20 November 2015 at 10:05, Keith Holleman <[email protected]>
>> wrote:
>> 
>> 
>> Follow-up email here has the backtrace for the second method of
>> reproduction.  In this case the bridge is not deleted, it was using the loop
>> logic of effectively these commands:
>> 
>> 
>> <snip>
>> 
>> Thanks a lot for the report!
>> 
>> Would you be able to apply these two patches and see if they fix the
>> issue you are observing?
>> 
>> https://patchwork.ozlabs.org/patch/541190/
>> https://patchwork.ozlabs.org/patch/541191/
>> 
>> 
>> Now that your conntrack code has been committed, we decided to build off the
>> “openvswitch/ovs” repo on Github.  I built the top of the “branch-2.5”
>> branch as of this morning:
>> 
>> commit:
>> https://github.com/openvswitch/ovs/commit/2862aeff82a3216ea4592c57299569484cf159ea
>> 
>> and still see the crash.  The patches listed above do not apply cleanly: it
>> looks like much (although maybe not all?) of the logic is already committed.
>> 
>> Here’s what I see in /var/log/ovsswitchd.log:
>> 
>> 2015-12-08T22:19:38.770Z|01159|bridge|INFO|bridge lan0: using datapath ID
>> 0000dc39790002b0
>> 2015-12-08T22:19:38.770Z|01160|connmgr|INFO|lan0: added service controller
>> "punix:/var/run/openvswitch/lan0.mgmt"
>> 2015-12-08T22:19:38.842Z|01161|dpif|WARN|system@ovs-system: failed to
>> flow_del (No such file or directory)
>> ufid:7580e732-908d-4134-9ca9-f6887195c2ae
>> recirc_id(0),dp_hash(0),skb_priority(0),in_port(2),skb_mark(0),ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),eth(src=04:00:00:00:00:02,dst=04:00:00:00:00:fe),eth_type(0x0800),ipv4(src=192.168.27.2,dst=192.168.27.254,proto=6,tos=0,ttl=64,frag=no),tcp(src=39055,dst=11111),tcp_flags(psh|ack)
>> 2015-12-08T22:19:38.842Z|01162|util|EMER|lib/cmap.c:846: assertion ok failed
>> in cmap_replace()
>> 2015-12-08T22:19:39.175Z|00002|daemon_unix(monitor)|ERR|1 crashes: pid 978
>> died, killed (Aborted), core dumped, restarting
>> 
>> System information:
>> 
>> # ovs-ofctl --version
>> ovs-ofctl (Open vSwitch) 2.5.0
>> Compiled Dec  8 2015 12:16:49
>> OpenFlow versions 0x1:0x4
>> 
>> # uname -a
>> Linux cd25 3.10.20-rt14-copilot #1 SMP Tue Dec 8 12:11:17 PST 2015 mips64
>> GNU/Linux
>> 
>> Please let us know what other information we can provide to help figure this
>> out.
>> 
>> regards,
>> Ben
>> 
>> _______________________________________________
>> discuss mailing list
>> [email protected]
>> http://openvswitch.org/mailman/listinfo/discuss
>> 
>> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
discuss mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/discuss

Reply via email to