Well…unfortunately, the serial-number issue that I had seen before cause an 
issue doesn’t seem to be the case here. On both the working and non-working 
(for live migration) VMs, there is a <serial> element for applicable <disk> 
elements (per the XML below).

Anyone else have any ideas here?

On 1/29/18, 4:41 PM, "David Mabry" <dma...@ena.com.INVALID> wrote:

    Mike,
    
    Thanks for the reply.  As requested:
    
    Will not Migrate
    ============================
    <domain type='kvm' id='63'>
      <name>i-532-1392-NSVLTN</name>
      <uuid>f7dbf00b-2e15-4991-a407-cf27a3d65d1e</uuid>
      <description>Other PV Virtio-SCSI (64-bit)</description>
      <memory unit='KiB'>4194304</memory>
      <currentMemory unit='KiB'>4194304</currentMemory>
      <vcpu placement='static'>2</vcpu>
      <cputune>
        <shares>2000</shares>
      </cputune>
      <resource>
        <partition>/machine</partition>
      </resource>
      <sysinfo type='smbios'>
        <system>
          <entry name='manufacturer'>Apache Software Foundation</entry>
          <entry name='product'>CloudStack KVM Hypervisor</entry>
          <entry name='uuid'>f7dbf00b-2e15-4991-a407-cf27a3d65d1e</entry>
        </system>
      </sysinfo>
      <os>
        <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
        <boot dev='cdrom'/>
        <boot dev='hd'/>
        <smbios mode='sysinfo'/>
      </os>
      <features>
        <acpi/>
        <apic/>
        <pae/>
      </features>
      <cpu mode='custom' match='exact' check='partial'>
        <model fallback='allow'>Haswell-noTSX</model>
        <feature policy='require' name='erms'/>
      </cpu>
      <clock offset='utc'>
        <timer name='kvmclock'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>destroy</on_crash>
      <devices>
        <emulator>/usr/libexec/qemu-kvm</emulator>
        <disk type='network' device='disk'>
          <driver name='qemu' type='raw' cache='none' discard='unmap'/>
          <auth username='c01z01'>
            <secret type='ceph' uuid='10a32867-3386-369e-a391-c8e05e0fa8ed'/>
          </auth>
          <source protocol='rbd' 
name='c01z01/223e08b0-929c-4c47-833d-1f1de48610f3'>
            <host name='cephmonc01.nsvltn.ena.net' port='6789'/>
          </source>
          <backingStore/>
          <target dev='sda' bus='scsi'/>
          <iotune>
            <read_bytes_sec>524288000</read_bytes_sec>
            <write_bytes_sec>524288000</write_bytes_sec>
            <read_iops_sec>500</read_iops_sec>
            <write_iops_sec>500</write_iops_sec>
          </iotune>
          <serial>223e08b0929c4c47833d</serial>
          <alias name='scsi0-0-0-0'/>
          <address type='drive' controller='0' bus='0' target='0' unit='0'/>
        </disk>
        <disk type='network' device='disk'>
          <driver name='qemu' type='raw' cache='none' discard='unmap'/>
          <auth username='c01z01'>
            <secret type='ceph' uuid='10a32867-3386-369e-a391-c8e05e0fa8ed'/>
          </auth>
          <source protocol='rbd' 
name='c01z01/97e5a299-1efd-40ed-85f4-a5c365a56414'>
            <host name='cephmonc01.nsvltn.ena.net' port='6789'/>
          </source>
          <backingStore/>
          <target dev='sdb' bus='scsi'/>
          <iotune>
            <read_bytes_sec>524288000</read_bytes_sec>
            <write_bytes_sec>524288000</write_bytes_sec>
            <read_iops_sec>500</read_iops_sec>
            <write_iops_sec>500</write_iops_sec>
          </iotune>
          <serial>97e5a2991efd40ed85f4</serial>
          <alias name='scsi0-0-0-1'/>
          <address type='drive' controller='0' bus='0' target='0' unit='1'/>
        </disk>
        <disk type='file' device='cdrom'>
          <driver name='qemu' type='raw' cache='none'/>
          <backingStore/>
          <target dev='hdc' bus='ide'/>
          <readonly/>
          <alias name='ide0-1-0'/>
          <address type='drive' controller='0' bus='1' target='0' unit='0'/>
        </disk>
        <controller type='scsi' index='0'>
          <alias name='scsi0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x09' 
function='0x0'/>
        </controller>
        <controller type='usb' index='0' model='piix3-uhci'>
          <alias name='usb'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' 
function='0x2'/>
        </controller>
        <controller type='pci' index='0' model='pci-root'>
          <alias name='pci.0'/>
        </controller>
        <controller type='ide' index='0'>
          <alias name='ide'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' 
function='0x1'/>
        </controller>
        <interface type='bridge'>
          <mac address='02:00:08:da:00:1d'/>
          <source bridge='brvx-5312'/>
          <bandwidth>
            <inbound average='128000' peak='128000'/>
            <outbound average='128000' peak='128000'/>
          </bandwidth>
          <target dev='vnet32'/>
          <model type='virtio'/>
          <alias name='net0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
function='0x0'/>
        </interface>
        <serial type='pty'>
          <source path='/dev/pts/27'/>
          <target port='0'/>
          <alias name='serial0'/>
        </serial>
        <console type='pty' tty='/dev/pts/27'>
          <source path='/dev/pts/27'/>
          <target type='serial' port='0'/>
          <alias name='serial0'/>
        </console>
        <input type='tablet' bus='usb'>
          <alias name='input0'/>
        </input>
        <input type='mouse' bus='ps2'/>
        <input type='keyboard' bus='ps2'/>
        <graphics type='vnc' port='5927' autoport='yes' listen='10.10.10.10'>
          <listen type='address' address='10.10.10.10'/>
        </graphics>
        <video>
          <model type='cirrus' vram='16384' heads='1' primary='yes'/>
          <alias name='video0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' 
function='0x0'/>
        </video>
        <memballoon model='none'/>
      </devices>
    </domain>
    
    Will migrate
    ============================
    <domain type='kvm' id='24'>
      <name>i-532-1298-NSVLTN</name>
      <uuid>d6ec74b8-4f6a-405c-834e-ece42151b802</uuid>
      <description>Windows PV</description>
      <memory unit='KiB'>4194304</memory>
      <currentMemory unit='KiB'>4194304</currentMemory>
      <vcpu placement='static'>1</vcpu>
      <cputune>
        <shares>1000</shares>
      </cputune>
      <resource>
        <partition>/machine</partition>
      </resource>
      <sysinfo type='smbios'>
        <system>
          <entry name='manufacturer'>Apache Software Foundation</entry>
          <entry name='product'>CloudStack KVM Hypervisor</entry>
          <entry name='uuid'>d6ec74b8-4f6a-405c-834e-ece42151b802</entry>
        </system>
      </sysinfo>
      <os>
        <type arch='x86_64' machine='pc-i440fx-rhel7.0.0'>hvm</type>
        <boot dev='cdrom'/>
        <boot dev='hd'/>
        <smbios mode='sysinfo'/>
      </os>
      <features>
        <acpi/>
        <apic/>
        <pae/>
      </features>
      <cpu mode='custom' match='exact' check='full'>
        <model fallback='forbid'>Haswell</model>
        <feature policy='disable' name='hle'/>
        <feature policy='disable' name='rtm'/>
        <feature policy='require' name='erms'/>
        <feature policy='require' name='hypervisor'/>
        <feature policy='require' name='xsaveopt'/>
      </cpu>
      <clock offset='localtime'>
        <timer name='rtc' tickpolicy='catchup'/>
      </clock>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>destroy</on_crash>
      <devices>
        <emulator>/usr/libexec/qemu-kvm</emulator>
        <disk type='network' device='disk'>
          <driver name='qemu' type='raw' cache='none'/>
          <auth username='c01z01'>
            <secret type='ceph' uuid='10a32867-3386-369e-a391-c8e05e0fa8ed'/>
          </auth>
          <source protocol='rbd' 
name='c01z01/f0b58e22-d05a-4825-8a4a-df0003a261b0'>
            <host name='cephmonc01.nsvltn.ena.net' port='6789'/>
          </source>
          <backingStore/>
          <target dev='vda' bus='virtio'/>
          <iotune>
            <read_bytes_sec>524288000</read_bytes_sec>
            <write_bytes_sec>524288000</write_bytes_sec>
            <read_iops_sec>500</read_iops_sec>
            <write_iops_sec>500</write_iops_sec>
          </iotune>
          <serial>f0b58e22d05a48258a4a</serial>
          <alias name='virtio-disk0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x04' 
function='0x0'/>
        </disk>
        <disk type='network' device='disk'>
          <driver name='qemu' type='raw' cache='none'/>
          <auth username='c01z01'>
            <secret type='ceph' uuid='10a32867-3386-369e-a391-c8e05e0fa8ed'/>
          </auth>
          <source protocol='rbd' 
name='c01z01/cd0c2822-3912-4730-ac55-2c1b4d99ab1d'>
            <host name='cephmonc01.nsvltn.ena.net' port='6789'/>
          </source>
          <backingStore/>
          <target dev='vdb' bus='virtio'/>
          <iotune>
            <read_bytes_sec>524288000</read_bytes_sec>
            <write_bytes_sec>524288000</write_bytes_sec>
            <read_iops_sec>500</read_iops_sec>
            <write_iops_sec>500</write_iops_sec>
          </iotune>
          <serial>cd0c282239124730ac55</serial>
          <alias name='virtio-disk1'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x05' 
function='0x0'/>
        </disk>
        <disk type='file' device='cdrom'>
          <driver name='qemu' type='raw' cache='none'/>
          <backingStore/>
          <target dev='hdc' bus='ide'/>
          <readonly/>
          <alias name='ide0-1-0'/>
          <address type='drive' controller='0' bus='1' target='0' unit='0'/>
        </disk>
        <controller type='ide' index='0'>
          <alias name='ide'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' 
function='0x1'/>
        </controller>
        <controller type='usb' index='0' model='piix3-uhci'>
          <alias name='usb'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x01' 
function='0x2'/>
        </controller>
        <controller type='pci' index='0' model='pci-root'>
          <alias name='pci.0'/>
        </controller>
        <interface type='bridge'>
          <mac address='02:00:68:b4:00:0d'/>
          <source bridge='brvx-5312'/>
          <bandwidth>
            <inbound average='128000' peak='128000'/>
            <outbound average='128000' peak='128000'/>
          </bandwidth>
          <target dev='vnet19'/>
          <model type='virtio'/>
          <alias name='net0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x03' 
function='0x0'/>
        </interface>
        <serial type='pty'>
          <source path='/dev/pts/21'/>
          <target port='0'/>
          <alias name='serial0'/>
        </serial>
        <console type='pty' tty='/dev/pts/21'>
          <source path='/dev/pts/21'/>
          <target type='serial' port='0'/>
          <alias name='serial0'/>
        </console>
        <input type='tablet' bus='usb'>
          <alias name='input0'/>
        </input>
        <input type='mouse' bus='ps2'>
          <alias name='input1'/>
        </input>
        <input type='keyboard' bus='ps2'>
          <alias name='input2'/>
        </input>
        <graphics type='vnc' port='5919' autoport='yes' listen='10.10.10.10'>
          <listen type='address' address='10.10.10.10'/>
        </graphics>
        <video>
          <model type='cirrus' vram='16384' heads='1' primary='yes'/>
          <alias name='video0'/>
          <address type='pci' domain='0x0000' bus='0x00' slot='0x02' 
function='0x0'/>
        </video>
        <memballoon model='none'/>
      </devices>
      <seclabel type='none' model='none'/>
      <seclabel type='dynamic' model='dac' relabel='yes'>
        <label>+107:+107</label>
        <imagelabel>+107:+107</imagelabel>
      </seclabel>
    </domain>
    
    
    David Mabry
    Manager of Systems Engineering
    On 1/29/18, 5:30 PM, "Tutkowski, Mike" <mike.tutkow...@netapp.com> wrote:
    
        Hi David,
        
        So, I don’t know if what I am going to say here will at all be of use 
to you, but maybe. :)
        
        I had a customer one time mention to me that he had trouble with live 
VM migration on KVM with a VM that was created on an older version of 
CloudStack. Live VM migration worked fine for these VMs on the older version of 
CloudStack (I think it was version 4.5) and stopped working when he upgraded to 
4.8. New VMs (VMs created on the newer version of CloudStack) worked fine for 
this feature on 4.8, but old VMs had to be stopped and re-started for live VM 
migration to work. I believe the older version of CloudStack was not placing 
the serial number of the VM in the VM’s XML descriptor file, but newer versions 
of CloudStack were expecting this field.
        
        Can you dump the XML of one or both of your VMs that don’t live migrate 
and see if they have the serial number field in their XML? Then, I’d recommend 
dumping the XML of the VM that works and seeing if it does, in fact, have the 
serial number field in its XML.
        
        I hope this is of some help.
        
        Talk to you later,
        Mike
        
        On 1/29/18, 11:48 AM, "David Mabry" <dma...@ena.com.INVALID> wrote:
        
            Good day Cloudstack Devs,
            
            I've run across a real head scratcher.  I have two VMs, (initially 
3 VMs, but more on that later) on a single host, that I cannot live migrate to 
any other host in the same cluster.  We discovered this after attempting to 
roll out patches going from CentOS 7.2 to CentOS 7.4.  Initially, we thought it 
had something to do with the new version of libvirtd or qemu-kvm on the other 
hosts in the cluster preventing these VMs from migrating, but we are able to 
live migrate other VMs to and from this host without issue.  We can even create 
new VMs on this specific host and live migrate them after creation with no 
issue.  We've put the migration source agent, migration destination agent and 
the management server in debug and don't seem to get anything useful other than 
"Unsupported command".  Luckily, we did have one VM that was shutdown and 
restarted, this is the 3rd VM mentioned above.  Since that VM has been 
restarted, it has no issues live migrating to any other host in the cluster.
            
            I'm at a loss as to what to try next and I'm hoping that someone 
out there might have had a similar issue and could shed some light on what to 
do.  Obviously, I can contact the customer and have them shutdown their VMs, 
but that will potentially just delay this problem to be solved another day.  
Even if shutting down the VMs is ultimately the solution, I'd still like to 
understand what happened to cause this issue in the first place with the hopes 
of preventing it in the future.
            
            Here's some information about my setup:
            Cloudstack 4.8 Advanced Networking
            CentOS 7.2 and 7.4 Hosts
            Ceph RBD Primary Storage
            NFS Secondary Storage
            Instance in Question for Debug: i-532-1392-NSVLTN
            
            I have attached relevant debug logs to this email if anyone wishes 
to take a look.  I think the most interesting error message that I have 
received is the following:
            
            468390:2018-01-27 08:59:35,172 DEBUG [c.c.a.t.Request] 
(Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802 ctx-8e7f45ad) 
(logid:f0888362) Seq 22-942378222027276319: Received:  { Ans: , MgmtId: 
14038012703634, via: 22(csh02c01z01.nsvltn.ena.net), Ver: v1, Flags: 110, { 
UnsupportedAnswer } }
            468391:2018-01-27 08:59:35,172 WARN  [c.c.a.m.AgentManagerImpl] 
(Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802 ctx-8e7f45ad) 
(logid:f0888362) Unsupported Command: Unsupported command issued: 
com.cloud.agent.api.PrepareForMigrationCommand.  Are you sure you got the right 
type of server?
            468392:2018-01-27 08:59:35,179 ERROR [c.c.v.VmWorkJobHandlerProxy] 
(Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802 ctx-8e7f45ad) 
(logid:f0888362) Invocation exception, caused by: 
com.cloud.exception.AgentUnavailableException: Resource [Host:22] is 
unreachable: Host 22: Unable to prepare for migration due to Unsupported 
command issued: com.cloud.agent.api.PrepareForMigrationCommand.  Are you sure 
you got the right type of server?
            468393:2018-01-27 08:59:35,179 INFO  [c.c.v.VmWorkJobHandlerProxy] 
(Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802 ctx-8e7f45ad) 
(logid:f0888362) Rethrow exception 
com.cloud.exception.AgentUnavailableException: Resource [Host:22] is 
unreachable: Host 22: Unable to prepare for migration due to Unsupported 
command issued: com.cloud.agent.api.PrepareForMigrationCommand.  Are you sure 
you got the right type of server?
            
            I've tracked this "Unsupported command" down in the CS 4.8 code to 
cloudstack/api/src/com/cloud/agent/api/Answer.java which is the generic answer 
class.  I believe where the error is really being spawned from is 
cloudstack/engine/orchestration/src/com/cloud/vm/VirtualMachineManagerImpl.java.
  Specifically:
                    Answer pfma = null;
                    try {
                        pfma = _agentMgr.send(dstHostId, pfmc);
                        if (pfma == null || !pfma.getResult()) {
                            final String details = pfma != null ? 
pfma.getDetails() : "null answer returned";
                            final String msg = "Unable to prepare for migration 
due to " + details;
                            pfma = null;
                            throw new AgentUnavailableException(msg, dstHostId);
                        }
            
            The pfma returned must be in error or is never returned and 
therefore still null.  That answer appears that it should be coming from the 
destination agent, but for the life of me I can't figure out what the root 
cause of this error is beyond, "Unsupported command issued".  What command is 
unsupported?  My guess is that it could be something wrong with the dxml that 
is generated and passed to the destination host, but I have as yet been unable 
to catch that dxml in debug.
            
            Any help or guidance is greatly appreciated.
            
            Thanks,
            David Mabry
            
            
        
        
    
    

Reply via email to