On Wed, Mar 6, 2013 at 7:41 AM, Ernesto Domato <edo...@gmail.com> wrote: > Sorry for the late response. > > On Mon, Mar 4, 2013 at 7:06 PM, Ansis Atteka <aatt...@nicira.com> wrote: >> On Mon, Mar 4, 2013 at 12:08 PM, Ernesto Domato <edo...@gmail.com> wrote: >> >> If you do not block on interface creation and libvirt/Open vSwitch >> init.d dependencies are not right, then I think you might end up with >> another race condition, where VM automatic start-up would fail. >> Imagine that: >> 1. Neither Open vSwitch or libvirt are running >> 2. libvirt starts up >> 3. libvirt tries to spin up VM and executes "ovs-vsctl --no-wait >> --timeout=5 -- del-port ... -- add-port ..." command. After 5 seconds >> this command times out, because Open vSwitch wasn't running >> 4. After 6 seconds Open vSwitch starts up, but VM still remains down. >> >> This means that you will have to manually start the VM one more time. >> > > Ok, so I guess that to solve this problem, the right solution would be > that libvirt wait till OVS is up by not timing out, right?
I actually see two solutions: 1. get daemon dependencies right (I see that Eric from libvirt project also recommended this); or 2. don't timeout when executing "ovs-vsctl add-port" command from libvirt I would prefer solution #1. If we would go with solution #2, then I am worried that someone else later on will complain that libvirt is stuck again (when OVS was not running). > > I think that could be a good idea, I'll write a new patch for libvirt > then and send it to them for comments. > >>> >>> I also added "--no-wait --timeout 5" when libvirt goes down so it can >>> timeout if ovs-switch is down. >> Just curious, but wasn't this part already solved with libvirt commit >> 98e732fc34a47ad9dfdb64aa4207623ee4c1ebcd (network: prevent infinite >> hang if ovs-vswitchd isn't running)? Are you using libvirt that has >> this patch? >> > > Ok, I'm using the stable version of libvirt and that fix is in > experimental package. Anyway, this fix only adds the timeout flag when > deleting the interface from OVS which does that libvirt don't hang > trying to delete the interface if OVS is down. But it doesn't fix the > issue that the OVS-DB still have the reference to the virtual > interface and so, when you bring the virtual machine up again, it > doesn't response because of this. As you (Ansis Atteka I guess) What do you mean by "doesn't response"? Even, if you pass "--timeout=5" and "--may-exist" to "add-port" command, then sometimes it still indefinitely blocks? > recommended before, my patch adds that libvirt try to delete (with the > --if-exists flag) the interface before adding it again so that problem > is resolved. It was long time ago, but I think that adding "--may-exist" flag to "add-port" command was sufficient. Can you debug this a little bit more and tell me what exactly is failing here (e.g. provide output of "ps -Af | grep ovs-vsctl" should be fine when this blocking happens)? If this indeed turns out to be a problem, then I think a long term solution would be to: 1. mark ovs ports created by libvirt with something like other_config:created-by-libvirt=true 2. If libvirt did not have a chance to delete old ports on shutdown, then at the startup it should iterate over all ports and delete unused ports where other_config:created-by-libvirt=true > >> By the way I tried to execute the same commands as libvirt would have >> executed. Except I ran them directly in the shell. What I observed is >> that ovs-vsctl did not block indefinitely for this corner case: >> >> root@ubuntu:~# service openvswitch-switch stop >> * ovs-brcompatd is not running >> * Killing ovs-vswitchd (13202) >> * Killing ovsdb-server (13193) >> root@ubuntu:~# ovs-vsctl --timeout=5 -- --if-exists del-port p0 >> Mar 04 13:39:14|00002|stream_unix|ERR|/tmp/stream-unix.13222.0: >> connection to /var/run/openvswitch/db.sock failed: No such file or >> directory >> Mar 04 13:39:14|00003|reconnect|WARN|unix:/var/run/openvswitch/db.sock: >> connection attempt failed (No such file or directory) >> Mar 04 13:39:15|00004|stream_unix|ERR|/tmp/stream-unix.13222.1: >> connection to /var/run/openvswitch/db.sock failed: No such file or >> directory >> Mar 04 13:39:15|00005|reconnect|WARN|unix:/var/run/openvswitch/db.sock: >> connection attempt failed (No such file or directory) >> Mar 04 13:39:17|00006|stream_unix|ERR|/tmp/stream-unix.13222.2: >> connection to /var/run/openvswitch/db.sock failed: No such file or >> directory >> Mar 04 13:39:17|00007|reconnect|WARN|unix:/var/run/openvswitch/db.sock: >> connection attempt failed (No such file or directory) >> Alarm clock >> root@ubuntu:~# >> >> I tried this with Open vSwitch: 1.4.3-0ubuntu2.1. Are you seeing the >> same effect with your Open vSwitch? > > Yes, I see the same behavior but after applying the patch that adds > the --timeout flag to the older version of libvirt that is currently > in Debian as stable but is incorporated in experimental package. > > So, summarizing, what I'll do is to download the GIT version of > libvirt, make patch that just tries to delete the interface before > adding it again when the virtual machine is going up and do it in a > separate call to ovs-vsctl so when adding the interface, it wait for > OVS to be up. > > Is that the way to solve this problem?, what do you think? :-) Let me know answers on other questions I asked above. That will help to figure out the right strategy. Also as Ben suggested, in long run we should get rid of "--timeout=5" and use something like "--try-once" in ovs-vsctl, when OVS DB is running locally. Otherwise, for example, with 10 ports aggregate timeout can be up to 50 seconds. > > Thanks > Ernesto > _______________________________________________ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev