Prasanna Santhanam created CLOUDSTACK-4499:
----------------------------------------------

             Summary: Xen6.1/Xen6.2 hosts initially transition to 'Alert' and 
then to 'Up' after addHost
                 Key: CLOUDSTACK-4499
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-4499
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
            Reporter: Prasanna Santhanam
            Priority: Critical


The same bug was reported in CLOUDSTACK-3839 which has been determined to be a
UI issue but the problem is with the addHost API and Xenserver hosts 6.1/6.2. 

When a freshly provisioned 6.1/6.2 host is added to cloudstack it initially
goes into 'Alert' state thereby failing the storage pool creation subsequently.
StoragePool addition requires the hosts to be in Up state. But a minute or two
later the Xenserver host automatically moves back to Up state after which
storage pool addition can occur

When the host is added to cloudstack:
2013-08-26 06:46:22,965 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-1:null) Seq 1-1916403719: Executing request
2013-08-26 06:46:23,261 DEBUG [xen.resource.CitrixResourceBase] 
(DirectAgent-1:null) Can't find a vif on dom0 for link local, creating a new one
2013-08-26 06:46:23,274 DEBUG [xen.resource.CitrixResourceBase] 
(DirectAgent-1:null) Lowest available Vif device number: 0 for VM: Control 
domain on host: apache-81-3
2013-08-26 06:46:23,600 WARN  [xen.resource.CitrixResourceBase] 
(DirectAgent-1:null) Unable to create local link network
The server failed to handle your request, due to an internal error.  The given 
message may give details useful for debugging the problem.
    at com.xensource.xenapi.Types.checkResponse(Types.java:1694)
    at com.xensource.xenapi.Connection.dispatch(Connection.java:368)
    at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool$XenServerConnection.dispatch(XenServerConnectionPool.java:909)
    at com.xensource.xenapi.VIF.plug(VIF.java:846)


Host marked Alert:
2013-08-26 06:46:23,616 DEBUG [agent.manager.AgentManagerImpl] 
(catalina-exec-14:null) Sending Disconnect to listener: 
com.cloud.network.NetworkUsageManagerImpl$DirectNetworkStatsListener
2013-08-26 06:46:23,616 DEBUG [cloud.network.NetworkUsageManagerImpl] 
(catalina-exec-14:null) Disconnected called on 1 with status Alert
2013-08-26 06:46:23,616 DEBUG [agent.manager.AgentManagerImpl] 
(catalina-exec-14:null) Sending Disconnect to listener: 
com.cloud.consoleproxy.ConsoleProxyListener
2013-08-26 06:46:23,619 DEBUG [cloud.host.Status] (catalina-exec-14:null) 
Transition:[Resource state = Enabled, Agent event = AgentDisconnected, Host id 
= 1, name = apache-81-3]
2013-08-26 06:46:23,625 DEBUG [cloud.host.Status] (catalina-exec-14:null) Agent 
status update: [id = 1; name = apache-81-3; old status = Connecting; event = 
AgentDisconnected; new status = Alert; old update count = 1; new update count = 
2]

Automatic retry by CloudStack a few moments later:
2013-08-26 06:46:52,752 DEBUG [host.dao.HostDaoImpl] (ClusteredAgentManager 
Timer:null) Completed acquiring hosts for clusters not owned by any management 
server
2013-08-26 06:46:52,756 DEBUG [agent.manager.ClusteredAgentManagerImpl] 
(ClusteredAgentManager Timer:null) Found 2 unmanaged direct hosts, processing 
connect for them...
2013-08-26 06:46:52,757 DEBUG [agent.manager.ClusteredAgentManagerImpl] 
(ClusteredAgentManager Timer:null) Loading directly connected host 
1(apache-81-3)

Meanwhile Storage Pool addition fails:
2013-08-26 06:46:52,766 DEBUG [cloud.api.ApiServlet] (catalina-exec-17:null) 
===END===  10.208.8.5 -- GET  
username=root&apiKey=sLUjl2n1c33JJpY4ZzMtObUgp9Ah2xW5inofCclZQQ1V6xp5k3CmAsGhemLwQxVvQY67EiopM_fbWBoJRKCIww&podid=fdf30680-d5c7-425e-bd7f-fb8c52e7cb96&hypervisor=XenServer&clusterid=10ff9a88-54bc-405b-9b4b-c26acf9caca8&zoneid=0431023e-ff78-43b6-a32d-262f36a78cbb&command=addHost&url=http%3A%2F%2Fapache-81-2&signature=oJqOno9flCtPMSSHH%2FZD%2BJl5QXw%3D&response=json
2013-08-26 06:46:52,773 DEBUG [cloud.api.ApiServlet] (catalina-exec-16:null) 
===START===  10.208.8.5 -- GET  
apiKey=sLUjl2n1c33JJpY4ZzMtObUgp9Ah2xW5inofCclZQQ1V6xp5k3CmAsGhemLwQxVvQY67EiopM_fbWBoJRKCIww&name=z0p0c0ps0&url=nfs%3A%2F%2Fnfs.fmt.vmops.com%3A%2Fexport%2Fautomation%2Facs%2Fprimary&podid=fdf30680-d5c7-425e-bd7f-fb8c52e7cb96&clusterid=10ff9a88-54bc-405b-9b4b-c26acf9caca8&zoneid=0431023e-ff78-43b6-a32d-262f36a78cbb&command=createStoragePool&signature=70CuFbmf58PU4X0QbF1tKbBGT60%3D&response=json
2013-08-26 06:46:52,787 DEBUG [cloud.network.NetworkModelImpl] 
(ClusteredAgentManager Timer:null) Failed to retrive the default label for 
storage traffic:zone: 1 hypervisor: XenServer due to:Unable to find the default 
physical network with traffic=Storage in the specified zone id
2013-08-26 06:46:52,839 DEBUG 
[datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl] 
(catalina-exec-16:null) createPool Params @ scheme - nfs storageHost - 
nfs.fmt.vmops.com hostPath - /export/automation/acs/primary port - -1
2013-08-26 06:46:52,889 DEBUG [agent.manager.ClusteredAgentManagerImpl] 
(ClusteredAgentManager Timer:null) Loading directly connected host 
2(apache-81-2)
2013-08-26 06:46:52,892 DEBUG [agent.manager.AgentManagerImpl] 
(AgentTaskPool-2:null) Simulating start for resource 
192267e4-d97d-4e37-b899-82b157e3166f id 1
2013-08-26 06:46:52,900 DEBUG [cloud.storage.StorageManagerImpl] 
(catalina-exec-16:null) Failed to add data store
com.cloud.utils.exception.CloudRuntimeException: No host up to associate a 
storage pool with in cluster 1
    at 
org.apache.cloudstack.storage.datastore.lifecycle.CloudStackPrimaryDataStoreLifeCycleImpl.attachCluster(CloudStackPrimaryDataStoreLifeCycleImpl.java:371)
    at 
com.cloud.storage.StorageManagerImpl.createPool(StorageManagerImpl.java:749)
    at 
com.cloud.storage.StorageManagerImpl.createPool(StorageManagerImpl.java:177)


Host comes back to Up state:
2013-08-26 06:46:53,898 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-4:null) Seq 1-1519321095: Executing request
2013-08-26 06:46:54,279 DEBUG [xen.resource.CitrixResourceBase] 
(DirectAgent-4:null) already have a vif on dom0 for link local network
2013-08-26 06:46:54,703 DEBUG [agent.transport.Request] (AgentTaskPool-2:null) 
Seq -1--1: Startup request from directly connected host:  { Cmd , MgmtId: -1, 
via: -1, Ver: v1, Flags: 11, 
[{"cpus":4,"speed":2261,"memory":16190149248,"dom0MinMemory":588251136,"poolSync":false,"_clusterVMStates":{},"caps":"xen-3.0-x86_64
 , xen-3.0-x86_32p , hvm-3.0-x86_32 , hvm-3.0-x86_32p , 
hvm-3.0-x86_64","pool":"9dfa936a-30f7-b5d1-4d4e-b5a75facb313","hypervisorType":"XenServer","hostDetails":{"product_version":"6.1.0","private.network.device":"Pool-wide
 network associated with 
eth0","com.cloud.network.Networks.RouterPrivateIpStrategy":"DcGlobal","product_brand":"XenServer","product_version_text_short":"6.1","can_bridge_firewall":"false"},"hypervisorVersion":"6.1.0","type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"192267e4-d97d-4e37-b899-82b157e3166f","name":"apache-81-3","version":"4.2.0","iqn":"iqn.2013-08.com.vmops.fmt:691cf4d5","publicIpAddress":"10.208.8.69","publicNetmask":"255.255.255.192","publicMacAddress":"b8:ac:6f:94:f1:67","privateIpAddress":"10.208.8.69","privateMacAddress":"b8:ac:6f:94:f1:67","privateNetmask":"255.255.255.192","storageIpAddress":"10.208.8.69","storageNetmask":"255.255.255.192","storageMacAddress":"b8:ac:6f:94:f1:67","wait":0},{"totalSize":0,"poolInfo":{"uuid":"ae248ed7-dd2a-cfc8-53f7-f43661c596c8","host":"10.208.8.69","localPath":"lvm","hostPath":"lvm","poolType":"LVM","capacityBytes":491505319936,"availableBytes":491501125632},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","guid":"192267e4-d97d-4e37-b899-82b157e3166f","wait":0}]
 }
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] 
(AgentTaskPool-2:null) Dispatching resource state event 
CREATE_HOST_VO_FOR_DIRECT_CONNECT to 
BaremetalPxeManagerImpl_EnhancerByCloudStack_d79b93ae
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] 
(AgentTaskPool-2:null) Dispatching resource state event 
CREATE_HOST_VO_FOR_DIRECT_CONNECT to 
PremiumSecondaryStorageManagerImpl_EnhancerByCloudStack_b51c5fde
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] 
(AgentTaskPool-2:null) Dispatching resource state event 
CREATE_HOST_VO_FOR_DIRECT_CONNECT to 
LxcServerDiscoverer_EnhancerByCloudStack_449ab3e6
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] 
(AgentTaskPool-2:null) Dispatching resource state event 
CREATE_HOST_VO_FOR_DIRECT_CONNECT to NiciraNvp
2013-08-26 06:46:54,709 DEBUG [cloud.resource.ResourceManagerImpl] 
(AgentTaskPool-2:null) Dispatching resource state event 
CREATE_HOST_VO_FOR_DIRECT_CONNECT to OvmDiscoverer_EnhancerByCloudStack_ec7ad054
2013-08-26 06:46:54,710 DEBUG [cloud.resource.ResourceManagerImpl] 
(AgentTaskPool-2:null) Dispatching resource state event 
CREATE_HOST_VO_FOR_DIRECT_CONNECT to 
XcpServerDiscoverer_EnhancerByCloudStack_831158ad
2013-08-26 06:46:54,710 INFO  [xen.discoverer.XcpServerDiscoverer] 
(AgentTaskPool-2:null) Host: apache-81-3 connected with hypervisor type: 
XenServer. Checking CIDR...
2013-08-26 06:46:54,735 DEBUG [cloud.resource.ResourceState] 
(AgentTaskPool-2:null) Resource state update: [id = 1; name = apache-81-3; old 
state = Enabled; event = InternalCreated; new state = Enabled]
2013-08-26 06:46:54,736 DEBUG [cloud.host.Status] (AgentTaskPool-2:null) 
Transition:[Resource state = Enabled, Agent event = AgentConnected, Host id = 
1, name = apache-81-3]
2013-08-26 06:46:54,742 DEBUG [cloud.host.Status] (AgentTaskPool-2:null) Agent 
status update: [id = 1; name = apache-81-3; old status = Connecting; event = 
AgentConnected; new status = Connecting; old update count = 3; new update count 
= 4]
2013-08-26 06:46:54,946 DEBUG [cloud.host.Status] (AgentTaskPool-1:null) Agent 
status update: [id = 1; name = apache-81-3; old status = Connecting; event = 
Ready; new status = Up; old update count = 4; new update count = 5]



There appear to be two problems:
1. addHost API returns successfully even if the host has transitioned to 
'Alert' state
2. Something wrong with Xenserver 6.2/6.1 hosts that makes this happen


This is on the latest off the 4.2 branch (8689f7f).

Attaching management server log and Database Dump.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to