Mike and Wei,

Good news!  I was able to manually live migrate these VMs following the steps 
outlined below:

1.) virsh dumpxml 38 --migratable > 38.xml
2.) Change the vnc information in 38.xml to match destination host IP and 
available VNC port
3.) virsh migrate --verbose --live 38 --xml 38.xml 
qemu+tcp://destination.host.net/system

To my surprise, Cloudstack was able to discover and properly handle the fact 
that this VM was live migrated to a new host without issue.  Very cool.

Wei, I suspect you are correct when you said this was an issue with the 
cloudstack agent code.  After digging a little deeper, the agent is never 
attempting to talk to libvirt at all after prepping the dxml to send to the 
destination host.  I'm going to attempt to reproduce this in my lab and attach 
a remote debugger and see if I can get to the bottom of it.

Thanks again for the help guys!  I really appreciate it.

Thanks,
David Mabry

On 1/30/18, 9:55 AM, "David Mabry" <dma...@ena.com.INVALID> wrote:

    Ah, understood.  I'll take a closer look at the logs and make sure that I 
didn't accidentally miss those lines when I pulled together the logs for this 
email chain.
    
    Thanks,
    David Mabry
    On 1/30/18, 8:34 AM, "Wei ZHOU" <ustcweiz...@gmail.com> wrote:
    
        Hi David,
        
        I encountered the UnsupportAnswer once before, when I made some changes 
in
        the kvm plugin.
        
        Normally there should be some network configurations in the agent.log 
but I
        do not see it.
        
        -Wei
        
        
        2018-01-30 15:00 GMT+01:00 David Mabry <dma...@ena.com.invalid>:
        
        > Hi Wei,
        >
        > I detached the iso and received the same error.  Just out of 
curiosity,
        > what leads you to believe it is something in the vxlan code?  I guess 
at
        > this point, attaching a remote debugger to the agent in question 
might be
        > the best way to get to the bottom of what is going on.
        >
        > Thanks in advance for the help.  I really, really appreciate it.
        >
        > Thanks,
        > David Mabry
        >
        > On 1/30/18, 3:30 AM, "Wei ZHOU" <ustcweiz...@gmail.com> wrote:
        >
        >     The answer should be caused by an exception in the cloudstack 
agent.
        >     I tried to migrate a vm in our testing env, it is working.
        >
        >     there are some different between our env and yours.
        >     (1) vlan VS vxlan
        >     (2) no ISO VS attached ISO
        >     (3) both of us use ceph and centos7.
        >
        >     I suspect it is caused by codes on vxlan.
        >     However, could you detach the ISO and try again ?
        >
        >     -Wei
        >
        >
        >
        >     2018-01-29 19:48 GMT+01:00 David Mabry <dma...@ena.com.invalid>:
        >
        >     > Good day Cloudstack Devs,
        >     >
        >     > I've run across a real head scratcher.  I have two VMs, 
(initially 3
        > VMs,
        >     > but more on that later) on a single host, that I cannot live 
migrate
        > to any
        >     > other host in the same cluster.  We discovered this after 
attempting
        > to
        >     > roll out patches going from CentOS 7.2 to CentOS 7.4.  
Initially, we
        >     > thought it had something to do with the new version of libvirtd 
or
        > qemu-kvm
        >     > on the other hosts in the cluster preventing these VMs from
        > migrating, but
        >     > we are able to live migrate other VMs to and from this host 
without
        > issue.
        >     > We can even create new VMs on this specific host and live 
migrate
        > them
        >     > after creation with no issue.  We've put the migration source 
agent,
        >     > migration destination agent and the management server in debug 
and
        > don't
        >     > seem to get anything useful other than "Unsupported command".
        > Luckily, we
        >     > did have one VM that was shutdown and restarted, this is the 
3rd VM
        >     > mentioned above.  Since that VM has been restarted, it has no 
issues
        > live
        >     > migrating to any other host in the cluster.
        >     >
        >     > I'm at a loss as to what to try next and I'm hoping that 
someone out
        > there
        >     > might have had a similar issue and could shed some light on 
what to
        > do.
        >     > Obviously, I can contact the customer and have them shutdown 
their
        > VMs, but
        >     > that will potentially just delay this problem to be solved 
another
        > day.
        >     > Even if shutting down the VMs is ultimately the solution, I'd 
still
        > like to
        >     > understand what happened to cause this issue in the first place 
with
        > the
        >     > hopes of preventing it in the future.
        >     >
        >     > Here's some information about my setup:
        >     > Cloudstack 4.8 Advanced Networking
        >     > CentOS 7.2 and 7.4 Hosts
        >     > Ceph RBD Primary Storage
        >     > NFS Secondary Storage
        >     > Instance in Question for Debug: i-532-1392-NSVLTN
        >     >
        >     > I have attached relevant debug logs to this email if anyone 
wishes
        > to take
        >     > a look.  I think the most interesting error message that I have
        > received is
        >     > the following:
        >     >
        >     > 468390:2018-01-27 08:59:35,172 DEBUG [c.c.a.t.Request]
        >     > (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
        > ctx-8e7f45ad)
        >     > (logid:f0888362) Seq 22-942378222027276319: Received:  { Ans: ,
        > MgmtId:
        >     > 14038012703634, via: 22(csh02c01z01.nsvltn.ena.net), Ver: v1,
        > Flags: 110,
        >     > { UnsupportedAnswer } }
        >     > 468391:2018-01-27 08:59:35,172 WARN  [c.c.a.m.AgentManagerImpl]
        >     > (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
        > ctx-8e7f45ad)
        >     > (logid:f0888362) Unsupported Command: Unsupported command 
issued:
        >     > com.cloud.agent.api.PrepareForMigrationCommand.  Are you sure 
you
        > got the
        >     > right type of server?
        >     > 468392:2018-01-27 08:59:35,179 ERROR 
[c.c.v.VmWorkJobHandlerProxy]
        >     > (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
        > ctx-8e7f45ad)
        >     > (logid:f0888362) Invocation exception, caused by:
        > com.cloud.exception.AgentUnavailableException:
        >     > Resource [Host:22] is unreachable: Host 22: Unable to prepare 
for
        > migration
        >     > due to Unsupported command issued: com.cloud.agent.api.
        > PrepareForMigrationCommand.
        >     > Are you sure you got the right type of server?
        >     > 468393:2018-01-27 08:59:35,179 INFO  
[c.c.v.VmWorkJobHandlerProxy]
        >     > (Work-Job-Executor-6:ctx-188ea30f job-181792/job-181802
        > ctx-8e7f45ad)
        >     > (logid:f0888362) Rethrow exception com.cloud.exception.
        > AgentUnavailableException:
        >     > Resource [Host:22] is unreachable: Host 22: Unable to prepare 
for
        > migration
        >     > due to Unsupported command issued: com.cloud.agent.api.
        > PrepareForMigrationCommand.
        >     > Are you sure you got the right type of server?
        >     >
        >     > I've tracked this "Unsupported command" down in the CS 4.8 code 
to
        >     > cloudstack/api/src/com/cloud/agent/api/Answer.java which is the
        > generic
        >     > answer class.  I believe where the error is really being spawned
        > from is
        >     > cloudstack/engine/orchestration/src/com/cloud/
        >     > vm/VirtualMachineManagerImpl.java.  Specifically:
        >     >         Answer pfma = null;
        >     >         try {
        >     >             pfma = _agentMgr.send(dstHostId, pfmc);
        >     >             if (pfma == null || !pfma.getResult()) {
        >     >                 final String details = pfma != null ?
        > pfma.getDetails() :
        >     > "null answer returned";
        >     >                 final String msg = "Unable to prepare for 
migration
        > due to
        >     > " + details;
        >     >                 pfma = null;
        >     >                 throw new AgentUnavailableException(msg, 
dstHostId);
        >     >             }
        >     >
        >     > The pfma returned must be in error or is never returned and 
therefore
        >     > still null.  That answer appears that it should be coming from 
the
        >     > destination agent, but for the life of me I can't figure out 
what
        > the root
        >     > cause of this error is beyond, "Unsupported command issued".  
What
        > command
        >     > is unsupported?  My guess is that it could be something wrong 
with
        > the dxml
        >     > that is generated and passed to the destination host, but I 
have as
        > yet
        >     > been unable to catch that dxml in debug.
        >     >
        >     > Any help or guidance is greatly appreciated.
        >     >
        >     > Thanks,
        >     > David Mabry
        >     >
        >     >
        >
        >
        >
        
    
    

Reply via email to