[ 
https://issues.apache.org/jira/browse/CLOUDSTACK-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061646#comment-16061646
 ] 

Marcus Sorensen commented on CLOUDSTACK-6203:
---------------------------------------------

vm.migrate.pauseafter of "1" (in milliseconds) would mean it should almost 
immediately pause, although I haven't actually run with a number that low to 
verify. The documentation for that field says:

# Set an upper limit in milliseconds for how long live migration should wait, 
at which point VM is paused and migration will finish quickly.

This is a poor-mans fix. As you probably know, ideally one should instead use:

# enable autoconvergence of VM to transfer busy VMs (if your libvirt supports 
it)
# vm.migrate.autoconverge=false

Though you need at least libvirt version 1.2.3 and Qemu 1.6 for autoconvergence 
to apply. These are not very current at this point so most people should be 
able to upgrade.

If you have to use pauseafter According to the code, the pauseafter tunable set 
to 1 should simply call dm.suspend on the source VM after 100 ms (the minimum 
interval for the system watching the VM migrate). You should see an info log 
"Pausing VM", if that doesn't trigger than there may be some underlying issue 
at hand, like incompatible or unreachable source/destination or something else 
that might be causing the Qemu process to crash.

On the chance that you are or do switch to autoconvergence, I'd not set the 
vm.migrate.downtime, as autoconverge should adjust automatically to your 
migrate capability. It is still possible that a migration can never complete, 
if your host just doesn't have the bandwidth to handle the transfer of a busy 
server (I've seen this on all platforms, not just KVM), but generally that's a 
rare occurrence and autoconvergence can throttle the VM's performance down 
enough, though it may still take a long time to complete. In the end, a good 
portion of migration success lies in the design of the system; very large and 
busy VMs are going to need high speed interconnects for migration that aren't 
busy doing other things like primary storage.

Hope that gives you at least something to look at.

> KVM live migration improvement
> ------------------------------
>
>                 Key: CLOUDSTACK-6203
>                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-6203
>             Project: CloudStack
>          Issue Type: Improvement
>      Security Level: Public(Anyone can view this level - this is the 
> default.) 
>          Components: KVM
>            Reporter: Marcus Sorensen
>            Assignee: Marcus Sorensen
>             Fix For: 4.4.0
>
>
> Run the KVM live migration in a thread so we can monitor it. This will allow 
> us to see how long migrations are taking and do things like pause the vm if 
> migration is stalling (per user defined time limit) to quickly complete 
> migration, or set the domain's max downtime during cut-over between machines 
> (higher values make migration of busy vms easier, lower values may make 
> migration stall).  In the future we can add the autoconvergence flag that 
> stalls VMs for a few ticks to allow memory copy to catch up, but it will be 
> awhile before libvirt that's shipped in a distro supports it, so these 
> tunables may be useful now.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to