Oh, sorry. I updated the code from master a few days ago.
Now I updae the code again. The OVS works in userspace, following is the output:
2013-11-22T03:40:26Z|00003|bridge|INFO|bridge br0: added interface br0 on port
65534
2013-11-22T03:40:26Z|00004|dpif_linux|ERR|Generic Netlink family 'ovs_datapath'
does not exist. The Open vSwitch kernel module is probably not loaded.
2013-11-22T03:40:26Z|00005|bridge|INFO|bridge br0: using datapath ID
0000c644fe67d34d
2013-11-22T03:40:26Z|00006|connmgr|INFO|br0: added service controller
"punix:/home/zhenglingyun/var/run/openvswitch/br0.mgmt"
2013-11-22T03:40:26Z|00001|ofproto_dpif_upcall(upcall_1)|INFO|received packet
on unassociated datapath port 1
2013-11-22T03:40:26Z|00007|bridge|INFO|ovs-vswitchd (Open vSwitch) 2.0.90
2013-11-22T03:40:36Z|00008|memory|INFO|15584 kB peak resident set size after
10.0 seconds
2013-11-22T03:40:36Z|00009|memory|INFO|facets:1 ports:1 rules:4 subfacets:1
2013-11-22T03:41:01Z|00010|connmgr|INFO|br0: added primary controller
"tcp:10.1.1.1:6633"
2013-11-22T03:41:01Z|00011|rconn|INFO|br0<->tcp:10.1.1.1:6633: connecting...
2013-11-22T03:41:01Z|00012|rconn|INFO|br0<->tcp:10.1.1.1:6633: connection timed
out
2013-11-22T03:41:01Z|00013|rconn|INFO|br0<->tcp:10.1.1.1:6633: waiting 1
seconds before reconnect
2013-11-22T03:41:02Z|00014|rconn|INFO|br0<->tcp:10.1.1.1:6633: connecting...
2013-11-22T03:41:03Z|00015|rconn|INFO|br0<->tcp:10.1.1.1:6633: connection timed
out
2013-11-22T03:41:03Z|00016|rconn|INFO|br0<->tcp:10.1.1.1:6633: waiting 2
seconds before reconnect
2013-11-22T03:41:05Z|00017|rconn|INFO|br0<->tcp:10.1.1.1:6633: connecting...
2013-11-22T03:41:07Z|00018|rconn|INFO|br0<->tcp:10.1.1.1:6633: connection timed
out
2013-11-22T03:41:07Z|00019|rconn|INFO|br0<->tcp:10.1.1.1:6633: waiting 4
seconds before reconnect
2013-11-22T03:41:11Z|00020|rconn|INFO|br0<->tcp:10.1.1.1:6633: connecting...
2013-11-22T03:41:15Z|00021|rconn|INFO|br0<->tcp:10.1.1.1:6633: connection timed
out
2013-11-22T03:41:15Z|00022|rconn|INFO|br0<->tcp:10.1.1.1:6633: continuing to
retry connections in the background but suppressing further logging
2013-11-22T03:41:15Z|00023|fail_open|WARN|Could not connect to controller (or
switch failed controller's post-connection admission control policy) for 15
seconds, failing open
2013-11-22T03:42:16Z|00024|fail_open|INFO|Still in fail-open mode after 76
seconds disconnected from controller
2013-11-22T03:42:26Z|00025|connmgr|INFO|br0: removed primary controller
"tcp:10.1.1.1:6633"
2013-11-22T03:42:26Z|00026|fail_open|WARN|No longer in fail-open mode
ovs-vswitchd: pthread_mutex_lock failed (Resource deadlock avoided)
Program received signal SIGABRT, Aborted.
0x77db1048 in raise () from /lib/libc.so.6
(gdb) bt
#0 0x77db1048 in raise () from /lib/libc.so.6
#1 0x77db61f8 in abort () from /lib/libc.so.6
#2 0x005b8e18 in ovs_abort_valist (err_no=45, format=0x64e9ec "pthread_%s_%s
failed", args=0x7fff3568) at lib/util.c:245
#3 0x005b8dc0 in ovs_abort (err_no=45, format=0x64e9ec "pthread_%s_%s failed")
at lib/util.c:237
#4 0x005725e0 in ovs_mutex_lock_at (l_=0x678dd0, where=0x636f8c
"ofproto/ofproto.c:4474") at lib/ovs-thread.c:59
#5 0x00441168 in handle_flow_mod__ (ofproto=0x699700, ofconn=0x0,
fm=0x7fff35d8, oh=0x0) at ofproto/ofproto.c:4474
#6 0x00436a48 in simple_flow_mod (ofproto=0x699700, match=0x7fff37e8,
priority=15790320, ofpacts=0x0, ofpacts_len=0, command=OFPFC_DELETE_STRICT) at
ofproto/ofproto.c:1872
#7 0x00436dc4 in ofproto_delete_flow (ofproto=0x699700, target=0x7fff37e8,
priority=15790320) at ofproto/ofproto.c:1959
#8 0x00492adc in fail_open_recover (fo=0x6992f8) at ofproto/fail-open.c:202
#9 0x00492e3c in fail_open_destroy (fo=0x6992f8) at ofproto/fail-open.c:256
#10 0x0048d288 in update_fail_open (mgr=0x69a9c8) at ofproto/connmgr.c:780
#11 0x0048c920 in connmgr_set_controllers (mgr=0x69a9c8, controllers=0x699218,
n_controllers=1, allowed_versions=0) at ofproto/connmgr.c:641
#12 0x00431e0c in ofproto_set_controllers (p=0x699700, controllers=0x699218,
n_controllers=1, allowed_versions=0) at ofproto/ofproto.c:652
#13 0x00425a04 in bridge_configure_remotes (br=0x699450, managers=0x0,
n_managers=0) at vswitchd/bridge.c:3114
#14 0x00419b88 in bridge_reconfigure_continue (ovs_cfg=0x69b870) at
vswitchd/bridge.c:628
#15 0x004223c0 in bridge_run () at vswitchd/bridge.c:2445
#16 0x0042b968 in main (argc=1, argv=0x7fff3cc4) at vswitchd/ovs-vswitchd.c:118
(gdb) frame 5
#5 0x00441168 in handle_flow_mod__ (ofproto=0x699700, ofconn=0x0,
fm=0x7fff35d8, oh=0x0) at ofproto/ofproto.c:4474
4474 ovs_mutex_lock(&ofproto_mutex);
(gdb) p ofproto_mutex
$1 = {lock = {__data = {__lock = 1, __count = 0, __owner = 7030, __kind = 2,
__nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
__size =
"\001\000\000\000\000\000\000\000v\033\000\000\002\000\000\000\001\000\000\000\000\000\000",
__align = 1}, where = 0x63ba70 "ofproto/connmgr.c:556"}
(gdb)
At 2013-11-22 11:07:33,"Ben Pfaff" <[email protected]> wrote:
>What version of OVS is this?
>
>On Fri, Nov 22, 2013 at 10:42:20AM +0800, ZhengLingyun wrote:
>> I set OVS connect to a non-existent controller 10.1.1.1, the controller will
>> fall into fail-open mode.
>> Then I delete the controller, the OVS will abort:
>>
>>
>>
>>
>> 2013-11-22T01:25:40Z|00010|connmgr|INFO|br0: added primary controller
>> "tcp:10.1.1.1:6632"
>> 2013-11-22T01:25:40Z|00011|rconn|INFO|br0<->tcp:10.1.1.1:6632: connecting...
>> 2013-11-22T01:25:41Z|00012|rconn|INFO|br0<->tcp:10.1.1.1:6632: connection
>> timed out
>> 2013-11-22T01:25:41Z|00013|rconn|INFO|br0<->tcp:10.1.1.1:6632: waiting 1
>> seconds before reconnect
>> 2013-11-22T01:25:42Z|00014|rconn|INFO|br0<->tcp:10.1.1.1:6632: connecting...
>> 2013-11-22T01:25:43Z|00015|rconn|INFO|br0<->tcp:10.1.1.1:6632: connection
>> timed out
>> 2013-11-22T01:25:43Z|00016|rconn|INFO|br0<->tcp:10.1.1.1:6632: waiting 2
>> seconds before reconnect
>> 2013-11-22T01:25:45Z|00017|rconn|INFO|br0<->tcp:10.1.1.1:6632: connecting...
>> 2013-11-22T01:25:47Z|00018|rconn|INFO|br0<->tcp:10.1.1.1:6632: connection
>> timed out
>> 2013-11-22T01:25:47Z|00019|rconn|INFO|br0<->tcp:10.1.1.1:6632: waiting 4
>> seconds before reconnect
>> 2013-11-22T01:25:51Z|00020|rconn|INFO|br0<->tcp:10.1.1.1:6632: connecting...
>> 2013-11-22T01:25:55Z|00021|rconn|INFO|br0<->tcp:10.1.1.1:6632: connection
>> timed out
>> 2013-11-22T01:25:55Z|00022|rconn|INFO|br0<->tcp:10.1.1.1:6632: continuing to
>> retry connections in the background but suppressing further logging
>> 2013-11-22T01:25:55Z|00023|fail_open|WARN|Could not connect to controller
>> (or switch failed controller's post-connection admission control policy) for
>> 15 seconds, failing open
>> 2013-11-22T01:26:45Z|00024|connmgr|INFO|br0: removed primary controller
>> "tcp:10.1.1.1:6632"
>> 2013-11-22T01:26:45Z|00025|fail_open|WARN|No longer in fail-open mode
>> ovs-vswitchd: pthread_mutex_lock failed (Resource deadlock avoided)
>>
>>
>> Program received signal SIGABRT, Aborted.
>> 0x77db1048 in raise () from /lib/libc.so.6
>> (gdb) bt
>> #0 0x77db1048 in raise () from /lib/libc.so.6
>> #1 0x77db61f8 in abort () from /lib/libc.so.6
>> #2 0x005b7998 in ovs_abort_valist (err_no=45, format=0x64d48c
>> "pthread_%s_%s failed", args=0x7fff3568) at lib/util.c:245
>> #3 0x005b7940 in ovs_abort (err_no=45, format=0x64d48c "pthread_%s_%s
>> failed") at lib/util.c:237
>> #4 0x00571190 in ovs_mutex_lock_at (l_=0x676e70, where=0x635a4c
>> "ofproto/ofproto.c:4474") at lib/ovs-thread.c:59
>> #5 0x00440f00 in handle_flow_mod__ (ofproto=0x699700, ofconn=0x0,
>> fm=0x7fff35d8, oh=0x0) at ofproto/ofproto.c:4474
>> #6 0x004367d8 in simple_flow_mod (ofproto=0x699700, match=0x7fff37e8,
>> priority=15790320, ofpacts=0x0, ofpacts_len=0, command=OFPFC_DELETE_STRICT)
>> at ofproto/ofproto.c:1872
>> #7 0x00436b54 in ofproto_delete_flow (ofproto=0x699700, target=0x7fff37e8,
>> priority=15790320) at ofproto/ofproto.c:1959
>> #8 0x0049279c in fail_open_recover (fo=0x697cd8) at ofproto/fail-open.c:202
>> #9 0x00492afc in fail_open_destroy (fo=0x697cd8) at ofproto/fail-open.c:256
>> #10 0x0048cf48 in update_fail_open (mgr=0x69a9c8) at ofproto/connmgr.c:780
>> #11 0x0048c5e0 in connmgr_set_controllers (mgr=0x69a9c8,
>> controllers=0x698678, n_controllers=1, allowed_versions=0) at
>> ofproto/connmgr.c:641
>> #12 0x00431b9c in ofproto_set_controllers (p=0x699700, controllers=0x698678,
>> n_controllers=1, allowed_versions=0) at ofproto/ofproto.c:652
>> #13 0x0042579c in bridge_configure_remotes (br=0x699450, managers=0x0,
>> n_managers=0) at vswitchd/bridge.c:3098
>> #14 0x00419a18 in bridge_reconfigure_continue (ovs_cfg=0x69b870) at
>> vswitchd/bridge.c:628
>> #15 0x00422158 in bridge_run () at vswitchd/bridge.c:2429
>> #16 0x0042b6f8 in main (argc=1, argv=0x7fff3cc4) at
>> vswitchd/ovs-vswitchd.c:118
>> (gdb) frame 5
>> #5 0x00440f00 in handle_flow_mod__ (ofproto=0x699700, ofconn=0x0,
>> fm=0x7fff35d8, oh=0x0) at ofproto/ofproto.c:4474
>> 4474 ovs_mutex_lock(&ofproto_mutex);
>> (gdb) p ofproto_mutex
>> $1 = {lock = {__data = {__lock = 1, __count = 0, __owner = 2293, __kind = 2,
>> __nusers = 1, {__spins = 0, __list = {__next = 0x0}}},
>> __size =
>> "\001\000\000\000\000\000\000\000\365\b\000\000\002\000\000\000\001\000\000\000\000\000\000",
>> __align = 1}, where = 0x63a530 "ofproto/connmgr.c:556"}
>> (gdb)
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2013-10-12 01:03:41,"Ben Pfaff" <[email protected]> wrote:
>> >OK, I added
>> > /* Required to add and remove ofconns. This could probably be narrowed
>> > to
>> > * cover a smaller amount of code, if that yielded some benefit. */
>> >just before taking the lock in connmgr_set_controllers().
>> >
>> >Thanks!
>> >
>> >On Fri, Oct 11, 2013 at 09:33:46AM -0700, Ethan Jackson wrote:
>> >> I wouldn't bother narrowing it. I was just confused about whether
>> >> it's required or not. Perhaps a brief comment explaining that logic
>> >> would be helpful.
>> >>
>> >> Ethan
>> >>
>> >> On Fri, Oct 11, 2013 at 9:09 AM, Ben Pfaff <[email protected]> wrote:
>> >> > On Fri, Oct 11, 2013 at 08:58:20AM -0700, Ethan Jackson wrote:
>> >> >> I guess the only thing I'm wondering about is how you know when
>> >> >> ofproto_mutex is required and when it isn't. In particular, why do we
>> >> >> need to hold it for the entire connmgr_set_controllers()? That's not
>> >> >> totally clear from the comment at the top of the file.
>> >> >
>> >> > I don't think we need to hold it for the entire
>> >> > connmgr_set_controllers(). I could probably narrow it, if you like. It
>> >> > was a little easier to keep it for the entire span of the function and I
>> >> > didn't think it would be a big deal because it's held for long times
>> >> > elsewhere during flow mods etc.
>> >> >
>> >> >> Acked-by: Ethan Jackson <[email protected]>
>> >_______________________________________________
>> >dev mailing list
>> >[email protected]
>> >http://openvswitch.org/mailman/listinfo/dev
_______________________________________________
dev mailing list
[email protected]
http://openvswitch.org/mailman/listinfo/dev