Hi,

This mail to share a fight we had at INRIA upgrading our Cloudstack/KVM farm
from 4.2 to 4.4.2 following this documentation :

http://cloudstack-release-notes.readthedocs.org/en/latest/upgrade/upgrade-4.2.html

It's now solved, but I would like to share, as I think :

- it could helps other people like us who have already migrated from Cloudstack 
3.X to 4.X
- there is one bug marked as fixed and it should not 
https://issues.apache.org/jira/browse/CLOUDSTACK-7399
- a little documentation is missing (how to test if we have the good qemu-kvm 
version for systemVMs templates)

Here are the (long) details

Technical informations :
------------------------

- Upgrade from Cloudstack 4.2.1 to 4.4.2
- CentOS 6/KVM for agents
- official Cloudstack rpms
- 1 zone with BasicNetworking

We are using cloudstack here in two environnments :

- qualification, with MS and agents created on 4.2.1
- production, with MS and agents originally created on 3.x version, long time 
ago before
Apache :D


Qualification troubles and solution :
-------------------------------------

- systemVM do not start after cloudstack-sysvmadm launch
- Solution was tu upgrade the KVM agents from Centos 6.3 to 6.6
- we think (not sure) that we had a trouble with an historical qemu-kvm 
version, and a good test
to document may be : what version of CentOS qemu-kvm supports, launching this 
command :
---
 /usr/libexec/qemu-kvm -M ?
---


Production troubles and solution :
----------------------------------

- cloudstack-sysvmadm takes hours to shutdown, upgrade and restart systemVM (2 
or 3 hours)
- starting/stopping existing instances works
- but we're unable to create new instances (error on MS :
---
com.cloud.exception.AgentUnavailableException: Resource [Host:xx] is 
unreachable: Host xx: Unable to start
instance due to Unable to get answer that is of class 
com.cloud.agent.api.StartAnswer
---
- when destroyed manually, systemVM won't restart
- debug on agents shows the same message as this bug : 
https://issues.apache.org/jira/browse/CLOUDSTACK-7399
which is officially resolved in 4.4.1 (our version is 4.4.2 !!!)
---
WARN  [cloud.agent.Agent] (agentRequest-Handler-2:null) Caught:
java.lang.NullPointerException
        at 
com.cloud.network.Networks$BroadcastDomainType.getSchemeValue(Networks.java:159)
...
DEBUG [cloud.agent.Agent] (agentRequest-Handler-2:null) Seq 
25-6233544834234187813:  { Ans: , MgmtId: 345044038925, via: 25, Ver: v1, 
Flags: 10, 
[{"com.cloud.agent.api.Answer":{"result":false,"details":"java.lang.NullPointerException\n\tat
 
com.cloud.network.Networks$BroadcastDomainType.getSchemeValue(Networks.java:159)\n\tat
 
com.cloud.network.Networks$BroadcastDomainType.getValue(Networks.java:213)\n\tat
 com.cloud.hypervisor.
...
---
- we had to find our bascicnetwork in mysql table networks, whom broadcast_uri 
was NULL
- and modify it to the "new" style vlan://untagged : 
---
update networks set broadcast_uri="vlan://untagged" where id="our bascinetwork 
id";

Hope it could help,

-- 
Laurent Steff

DSI/SESI
INRIA
http://www.inria.fr/

Reply via email to