[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595525#comment-15595525
 ] 

ASF GitHub Bot commented on CLOUDSTACK-7982:
--------------------------------------------

Github user mlsorensen commented on a diff in the pull request:

    https://github.com/apache/cloudstack/pull/1709#discussion_r84509497
  
    --- Diff: core/src/com/cloud/agent/api/CancelMigrationCommand.java ---
    @@ -0,0 +1,35 @@
    +// Licensed to the Apache Software Foundation (ASF) under one
    +// or more contributor license agreements.  See the NOTICE file
    +// distributed with this work for additional information
    +// regarding copyright ownership.  The ASF licenses this file
    +// to you under the Apache License, Version 2.0 (the
    +// "License"); you may not use this file except in compliance
    +// with the License.  You may obtain a copy of the License at
    +//
    +//   http://www.apache.org/licenses/LICENSE-2.0
    +//
    +// Unless required by applicable law or agreed to in writing,
    +// software distributed under the License is distributed on an
    +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    +// KIND, either express or implied.  See the License for the
    +// specific language governing permissions and limitations
    +// under the License.
    +package com.cloud.agent.api;
    +
    +public class CancelMigrationCommand extends Command {
    --- End diff --
    
    Along these lines of cancellation, I've long thought that we really need 
the ability to clean up long running jobs if the agent disconnects from the 
management server for any reason (say upgrade or restart of agent or management 
server, network problems, etc). Normally the management server will know the 
job failed but the agent keeps on trucking, causing problems, especially for 
things like migrations of storage. This may be an important thing to add for 
this feature, to avoid situations where a migration completes but CloudStack 
does not know about it because the management server was restarted during the 
migration.
    
    Rather than forcing the management server to know that the agent work needs 
to be cleaned up and sending a command to the hypervisor that is tailored to 
each command that can fail, one solution that I've seen implemented that has 
worked well is for LibvirtComputingResource to hold a global List<Runnable> of 
tasks, then it overrides the disconnected() method and loops through this list, 
running the tasks. It then exposes methods addDisconnectHook(Runnable hook) and 
removeDisconnectHook(Runnable hook) so that commands that are sensitive to 
being interrupted can add in cancellation logic in the case of disconnect 
before starting and remove it when finished.
    
    Something like:
    
        @Override
        public void disconnected() {
            this._connected = false;
            s_logger.info("Detected agent disconnect event, running through " + 
_disconnectHooks.size() + " disconnect hooks");
            for (Runnable hook : _disconnectHooks) {
                hook.run();
            }
            _disconnectHooks.clear();
        }
    
        public void addDisconnectHook(Runnable hook) {
            s_logger.debug("Adding disconnect hook " + hook);
            _disconnectHooks.add(hook);
        }
    
        public void removeDisconnectHook(Runnable hook) {
            s_logger.debug("Removing disconnect hook " + hook);
            if (_disconnectHooks.contains(hook)) {
                s_logger.debug("Removing disconnect hook " + hook);
                _disconnectHooks.remove(hook);
            } else {
                s_logger.debug("Requested removal of disconnect hook, but hook 
not found: " + hook);
            }
        }
    
    An example hook to cancel the migration might look like this:
    
        public class MigrationCancelHook extends Thread {
            private static final Logger LOGGER = 
Logger.getLogger(MigrationCancelHook.class.getName());
            private static final String HOOK_PREFIX = "MigrationCancelHook-";
            Domain _migratingDomain;
            String _vmName;
    
            public MigrationCancelHook(Domain migratingDomain) throws 
LibvirtException {
                super(HOOK_PREFIX + migratingDomain.getName());
                _migratingDomain = migratingDomain;
                _vmName = migratingDomain.getName();
            }
    
            @Override
            public void run() {
                LOGGER.info("Interrupted migration of " + _vmName);
                try {
                    if (_migratingDomain.abortJob() == 0) {
                        LOGGER.warn("Aborted migration job for " + _vmName);
                    }
                } catch (Exception ex) {
                    LOGGER.warn("Failed to abort migration job for " + _vmName, 
ex);
                }
            }
        }


> Storage live migration support for KVM
> --------------------------------------
>
>                 Key: CLOUDSTACK-7982
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-7982
>             Project: CloudStack
>          Issue Type: Improvement
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>            Reporter: Wei Zhou
>            Assignee: Marc-Aurèle Brothier
>             Fix For: Future
>
>
> Currently it supports Xenserver, Vmware, Hyper-V, but not KVM.
> We need to add the implementation for KVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to