Prasanna Santhanam created CLOUDSTACK-4499:
----------------------------------------------
Summary: Xen6.1/Xen6.2 hosts initially transition to 'Alert' and
then to 'Up' after addHost
Key: CLOUDSTACK-4499
URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4499
Project: CloudStack
Issue Type: Bug
Security Level: Public (Anyone can view this level - this is the default.)
Reporter: Prasanna Santhanam
Priority: Critical
The same bug was reported in CLOUDSTACK-3839 which has been determined to be a
UI issue but the problem is with the addHost API and Xenserver hosts 6.1/6.2.
When a freshly provisioned 6.1/6.2 host is added to cloudstack it initially
goes into 'Alert' state thereby failing the storage pool creation subsequently.
StoragePool addition requires the hosts to be in Up state. But a minute or two
later the Xenserver host automatically moves back to Up state after which
storage pool addition can occur
When the host is added to cloudstack:
2013-08-26 06:46:22,965 DEBUG [agent.manager.DirectAgentAttache]
(DirectAgent-1:null) Seq 1-1916403719: Executing request
2013-08-26 06:46:23,261 DEBUG [xen.resource.CitrixResourceBase]
(DirectAgent-1:null) Can't find a vif on dom0 for link local, creating a new one
2013-08-26 06:46:23,274 DEBUG [xen.resource.CitrixResourceBase]
(DirectAgent-1:null) Lowest available Vif device number: 0 for VM: Control
domain on host: apache-81-3
2013-08-26 06:46:23,600 WARN [xen.resource.CitrixResourceBase]
(DirectAgent-1:null) Unable to create local link network
The server failed to handle your request, due to an internal error. The given
message may give details useful for debugging the problem.
at com.xensource.xenapi.Types.checkResponse(Types.java:1694)
at com.xensource.xenapi.Connection.dispatch(Connection.java:368)
at
com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:909)
at com.xensource.xenapi.VIF.plug(VIF.java:846)
Host marked Alert:
2013-08-26 06:46:23,616 DEBUG [agent.manager.AgentManagerImpl]
(catalina-exec-14:null) Sending Disconnect to listener:
com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
2013-08-26 06:46:23,616 DEBUG [cloud.network.NetworkUsageManagerImpl]
(catalina-exec-14:null) Disconnected called on 1 with status Alert
2013-08-26 06:46:23,616 DEBUG [agent.manager.AgentManagerImpl]
(catalina-exec-14:null) Sending Disconnect to listener:
com.cloud.consoleproxy.ConsoleProxyListener
2013-08-26 06:46:23,619 DEBUG [cloud.host.Status] (catalina-exec-14:null)
Transition:[Resource state = Enabled, Agent event = AgentDisconnected, Host id
= 1, name = apache-81-3]
2013-08-26 06:46:23,625 DEBUG [cloud.host.Status] (catalina-exec-14:null) Agent
status update: [id = 1; name = apache-81-3; old status = Connecting; event =
AgentDisconnected; new status = Alert; old update count = 1; new update count =
2]
Automatic retry by CloudStack a few moments later:
2013-08-26 06:46:52,752 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager
Timer:null) Completed acquiring hosts for clusters not owned by any management
server
2013-08-26 06:46:52,756 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(ClusteredAgentManager Timer:null) Found 2 unmanaged direct hosts, processing
connect for them...
2013-08-26 06:46:52,757 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(ClusteredAgentManager Timer:null) Loading directly connected host
1(apache-81-3)
Meanwhile Storage Pool addition fails:
2013-08-26 06:46:52,766 DEBUG [cloud.api.ApiServlet] (catalina-exec-17:null)
===END=== 10.208.8.5 -- GET
username=root&apiKey=sLUjl2n1c33JJpY4ZzMtObUgp9Ah2xW5inofCclZQQ1V6xp5k3CmAsGhemLwQxVvQY67EiopM_fbWBoJRKCIww&podid=fdf30680-d5c7-425e-bd7f-fb8c52e7cb96&hypervisor=XenServer&clusterid=10ff9a88-54bc-405b-9b4b-c26acf9caca8&zoneid=0431023e-ff78-43b6-a32d-262f36a78cbb&command=addHost&url=http%3A%2F%2Fapache-81-2&signature=oJqOno9flCtPMSSHH%2FZD%2BJl5QXw%3D&response=json
2013-08-26 06:46:52,773 DEBUG [cloud.api.ApiServlet] (catalina-exec-16:null)
===START=== 10.208.8.5 -- GET
apiKey=sLUjl2n1c33JJpY4ZzMtObUgp9Ah2xW5inofCclZQQ1V6xp5k3CmAsGhemLwQxVvQY67EiopM_fbWBoJRKCIww&name=z0p0c0ps0&url=nfs%3A%2F%2Fnfs.fmt.vmops.com%3A%2Fexport%2Fautomation%2Facs%2Fprimary&podid=fdf30680-d5c7-425e-bd7f-fb8c52e7cb96&clusterid=10ff9a88-54bc-405b-9b4b-c26acf9caca8&zoneid=0431023e-ff78-43b6-a32d-262f36a78cbb&command=createStoragePool&signature=70CuFbmf58PU4X0QbF1tKbBGT60%3D&response=json
2013-08-26 06:46:52,787 DEBUG [cloud.network.NetworkModelImpl]
(ClusteredAgentManager Timer:null) Failed to retrive the default label for
storage traffic:zone: 1 hypervisor: XenServer due to:Unable to find the default
physical network with traffic=Storage in the specified zone id
2013-08-26 06:46:52,839 DEBUG
[datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl]
(catalina-exec-16:null) createPool Params @ scheme - nfs storageHost -
nfs.fmt.vmops.com hostPath - /export/automation/acs/primary port - -1
2013-08-26 06:46:52,889 DEBUG [agent.manager.ClusteredAgentManagerImpl]
(ClusteredAgentManager Timer:null) Loading directly connected host
2(apache-81-2)
2013-08-26 06:46:52,892 DEBUG [agent.manager.AgentManagerImpl]
(AgentTaskPool-2:null) Simulating start for resource
192267e4-d97d-4e37-b899-82b157e3166f id 1
2013-08-26 06:46:52,900 DEBUG [cloud.storage.StorageManagerImpl]
(catalina-exec-16:null) Failed to add data store
com.cloud.utils.exception.CloudRuntimeException: No host up to associate a
storage pool with in cluster 1
at
org.apache.cloudstack.storage.datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl.attachCluster(CloudStackPrimaryDataStoreLifeCycleImpl.java:371)
at
com.cloud.storage.StorageManagerImpl.createPool(StorageManagerImpl.java:749)
at
com.cloud.storage.StorageManagerImpl.createPool(StorageManagerImpl.java:177)
Host comes back to Up state:
2013-08-26 06:46:53,898 DEBUG [agent.manager.DirectAgentAttache]
(DirectAgent-4:null) Seq 1-1519321095: Executing request
2013-08-26 06:46:54,279 DEBUG [xen.resource.CitrixResourceBase]
(DirectAgent-4:null) already have a vif on dom0 for link local network
2013-08-26 06:46:54,703 DEBUG [agent.transport.Request] (AgentTaskPool-2:null)
Seq -1--1: Startup request from directly connected host: { Cmd , MgmtId: -1,
via: -1, Ver: v1, Flags: 11,
[{"cpus":4,"speed":2261,"memory":16190149248,"dom0MinMemory":588251136,"poolSync":false,"_clusterVMStates":{},"caps":"xen-3.0-x86_64
, xen-3.0-x86_32p , hvm-3.0-x86_32 , hvm-3.0-x86_32p ,
hvm-3.0-x86_64","pool":"9dfa936a-30f7-b5d1-4d4e-b5a75facb313","hypervisorType":"XenServer","hostDetails":{"product_version":"6.1.0","private.network.device":"Pool-wide
network associated with
eth0","com.cloud.network.Networks.RouterPrivateIpStrategy":"DcGlobal","product_brand":"XenServer","product_version_text_short":"6.1","can_bridge_firewall":"false"},"hypervisorVersion":"6.1.0","type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"192267e4-d97d-4e37-b899-82b157e3166f","name":"apache-81-3","version":"4.2.0","iqn":"iqn.2013-08.com.vmops.fmt:691cf4d5","publicIpAddress":"10.208.8.69","publicNetmask":"255.255.255.192","publicMacAddress":"b8:ac:6f:94:f1:67","privateIpAddress":"10.208.8.69","privateMacAddress":"b8:ac:6f:94:f1:67","privateNetmask":"255.255.255.192","storageIpAddress":"10.208.8.69","storageNetmask":"255.255.255.192","storageMacAddress":"b8:ac:6f:94:f1:67","wait":0},{"totalSize":0,"poolInfo":{"uuid":"ae248ed7-dd2a-cfc8-53f7-f43661c596c8","host":"10.208.8.69","localPath":"lvm","hostPath":"lvm","poolType":"LVM","capacityBytes":491505319936,"availableBytes":491501125632},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","guid":"192267e4-d97d-4e37-b899-82b157e3166f","wait":0}]
}
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl]
(AgentTaskPool-2:null) Dispatching resource state event
CREATE_HOST_VO_FOR_DIRECT_CONNECT to
BaremetalPxeManagerImpl_EnhancerByCloudStack_d79b93ae
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl]
(AgentTaskPool-2:null) Dispatching resource state event
CREATE_HOST_VO_FOR_DIRECT_CONNECT to
PremiumSecondaryStorageManagerImpl_EnhancerByCloudStack_b51c5fde
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl]
(AgentTaskPool-2:null) Dispatching resource state event
CREATE_HOST_VO_FOR_DIRECT_CONNECT to
LxcServerDiscoverer_EnhancerByCloudStack_449ab3e6
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl]
(AgentTaskPool-2:null) Dispatching resource state event
CREATE_HOST_VO_FOR_DIRECT_CONNECT to NiciraNvp
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl]
(AgentTaskPool-2:null) Dispatching resource state event
CREATE_HOST_VO_FOR_DIRECT_CONNECT to OvmDiscoverer_EnhancerByCloudStack_ec7ad054
2013-08-26 06:46:54,710 DEBUG [cloud.resource.ResourceManagerImpl]
(AgentTaskPool-2:null) Dispatching resource state event
CREATE_HOST_VO_FOR_DIRECT_CONNECT to
XcpServerDiscoverer_EnhancerByCloudStack_831158ad
2013-08-26 06:46:54,710 INFO [xen.discoverer.XcpServerDiscoverer]
(AgentTaskPool-2:null) Host: apache-81-3 connected with hypervisor type:
XenServer. Checking CIDR...
2013-08-26 06:46:54,735 DEBUG [cloud.resource.ResourceState]
(AgentTaskPool-2:null) Resource state update: [id = 1; name = apache-81-3; old
state = Enabled; event = InternalCreated; new state = Enabled]
2013-08-26 06:46:54,736 DEBUG [cloud.host.Status] (AgentTaskPool-2:null)
Transition:[Resource state = Enabled, Agent event = AgentConnected, Host id =
1, name = apache-81-3]
2013-08-26 06:46:54,742 DEBUG [cloud.host.Status] (AgentTaskPool-2:null) Agent
status update: [id = 1; name = apache-81-3; old status = Connecting; event =
AgentConnected; new status = Connecting; old update count = 3; new update count
= 4]
2013-08-26 06:46:54,946 DEBUG [cloud.host.Status] (AgentTaskPool-1:null) Agent
status update: [id = 1; name = apache-81-3; old status = Connecting; event =
Ready; new status = Up; old update count = 4; new update count = 5]
There appear to be two problems:
1. addHost API returns successfully even if the host has transitioned to
'Alert' state
2. Something wrong with Xenserver 6.2/6.1 hosts that makes this happen
This is on the latest off the 4.2 branch (8689f7f).
Attaching management server log and Database Dump.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira