Sangeetha Hariharan created CLOUDSTACK-2428:
-----------------------------------------------

             Summary: HA - When the master host is disconnected , the host 
status contines to remain in "Up" state because of 
com.cloud.utils.exception.CloudRuntimeException: Unable to reset master of 
slave 
                 Key: CLOUDSTACK-2428
                 URL: https://issues.apache.org/jira/browse/CLOUDSTACK-2428
             Project: CloudStack
          Issue Type: Bug
      Security Level: Public (Anyone can view this level - this is the default.)
          Components: Management Server
    Affects Versions: 4.2.0
         Environment: Build from pvaln
            Reporter: Sangeetha Hariharan
             Fix For: 4.2.0


1. Advance zone  with 1 cluster with 2 hosts. Create  Shared network with 
private vlan.
2. Deploy few HA enabled Vms in this network. 
3. pull network cable for one of the host.

When cloudstack detects that the host is disconnected , it is not able to out 
the host in disconnected state and start HA for Vms that are HA enabeld,

I see the following exception in the management server logs:


2013-05-09 17:15:55,576 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-267:null) Seq 1-1435828229: Executing request
2013-05-09 17:15:55,602 DEBUG [xen.resource.XenServerConnectionPool] 
(DirectAgent-267:null) Catch Exception: com.xensource.xenapi.Types$HostOffline 
Host is offline 10.223.81.62 due to You attempted an operation which involves a 
host which could not be contacted.
2013-05-09 17:15:55,603 DEBUG [xen.resource.XenServerConnectionPool] 
(DirectAgent-267:null) Trying to reset master of slave 10.223.81.62 to 
10.223.81.61
2013-05-09 17:16:02,319 WARN  [xen.resource.CitrixResourceBase] 
(DirectAgent-265:null)  can not ping xenserver 
520d4994-8b1f-4dda-b51d-2ee63750abf6
2013-05-09 17:16:02,319 WARN  [agent.manager.DirectAgentAttache] 
(DirectAgent-265:null) Unable to get current status on 1
2013-05-09 17:16:02,321 INFO  [agent.manager.AgentManagerImpl] 
(AgentTaskPool-11:null) Investigating why host 1 has disconnected with event 
AgentDisconnected
2013-05-09 17:16:02,321 DEBUG [agent.manager.AgentManagerImpl] 
(AgentTaskPool-11:null) checking if agent (1) is alive
2013-05-09 17:16:02,323 DEBUG [agent.transport.Request] (AgentTaskPool-11:null) 
Seq 1-1435828294: Sending  { Cmd , MgmtId: 7647994577963, via: 1, Ver: v1, 
Flags: 100011, [{"CheckHealthCommand":{"wait":50}}] }
2013-05-09 17:16:02,323 DEBUG [agent.transport.Request] (AgentTaskPool-11:null) 
Seq 1-1435828294: Executing:  { Cmd , MgmtId: 7647994577963, via: 1, Ver: v1, 
Flags: 100011, [{"CheckHealthCommand":{"wait":50}}] }
2013-05-09 17:16:02,323 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-271:null) Seq 1-1435828294: Executing request
2013-05-09 17:16:04,035 DEBUG [agent.manager.AgentAttache] 
(AgentTaskPool-10:null) Seq 6-474349576: Waiting some more time because this is 
the current command
2013-05-09 17:16:04,040 DEBUG [xen.resource.XenServerConnectionPool] 
(DirectAgent-268:null) localLogout has problem Failed to read server's 
response: connect timed out
2013-05-09 17:16:04,040 WARN  [agent.manager.DirectAgentAttache] 
(DirectAgent-268:null) Seq 1-1435828292: Exception Caught while executing 
command
com.cloud.utils.exception.CloudRuntimeException: Unable to reset master of 
slave 10.223.81.62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcException: 
Failed to read server's response: connect timed out
        at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool.PoolEmergencyResetMaster(XenServerConnectionPool.java:443)
        at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool.connect(XenServerConnectionPool.java:661)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.getConnection(CitrixResourceBase.java:5639)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.execute(CitrixResourceBase.java:1682)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:524)
        at 
com.cloud.hypervisor.xen.resource.XenServer56Resource.executeRequest(XenServer56Resource.java:73)
        at 
com.cloud.hypervisor.xen.resource.XenServer610Resource.executeRequest(XenServer610Resource.java:102)
        at 
com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgentAttache.java:186)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2013-05-09 17:16:04,041 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-268:null) Seq 1-1435828292: Response Received:
2013-05-09 17:16:04,041 DEBUG [agent.transport.Request] (DirectAgent-268:null) 
Seq 1-1435828292: Processing:  { Ans: , MgmtId: 7647994577963, via: 1, Ver: v1, 
Flags: 10, 
[{"Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
 Unable to reset master of slave 10.223.81.62 to 10.223.81.61 due to 
org.apache.xmlrpc.XmlRpcException: Failed to read server's response: connect 
timed out","wait":0}}] }
2013-05-09 17:16:04,041 DEBUG [agent.transport.Request] (AgentTaskPool-5:null) 
Seq 1-1435828292: Received:  { Ans: , MgmtId: 7647994577963, via: 1, Ver: v1, 
Flags: 10, { Answer } }
2013-05-09 17:16:04,041 DEBUG [cloud.ha.AbstractInvestigatorImpl] 
(AgentTaskPool-5:null) host (10.223.81.50) cannot be pinged, returning null ('I 
don't know')
2013-05-09 17:16:04,041 DEBUG [cloud.ha.UserVmDomRInvestigator] 
(AgentTaskPool-5:null) sending ping from (5) to agent's host ip address 
(10.223.81.50)
2013-05-09 17:16:04,043 DEBUG [agent.transport.Request] (AgentTaskPool-5:null) 
Seq 5-2082341067: Sending  { Cmd , MgmtId: 7647994577963, via: 5, Ver: v1, 
Flags: 100011, 
[{"PingTestCommand":{"_computingHostIp":"10.223.81.50","wait":20}}] }
2013-05-09 17:16:04,043 DEBUG [agent.transport.Request] (AgentTaskPool-5:null) 
Seq 5-2082341067: Executing:  { Cmd , MgmtId: 7647994577963, via: 5, Ver: v1, 
Flags: 100011, 
[{"PingTestCommand":{"_computingHostIp":"10.223.81.50","wait":20}}] }
2013-05-09 17:16:04,043 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-272:null) Seq 5-2082341067: Executing request
2013-05-09 17:16:04,053 DEBUG [xen.resource.XenServerConnectionPool] 
(DirectAgent-91:null) localLogout has problem Failed to read server's response: 
connect timed out
2013-05-09 17:16:04,053 WARN  [agent.manager.DirectAgentAttache] 
(DirectAgent-91:null) Seq 1-1435828293: Exception Caught while executing command
com.cloud.utils.exception.CloudRuntimeException: Unable to reset master of 
slave 10.223.81.62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcException: 
Failed to read server's response: connect timed out
        at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool.PoolEmergencyResetMaster(XenServerConnectionPool.java:443)
        at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool.connect(XenServerConnectionPool.java:661)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.getConnection(CitrixResourceBase.java:5639)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.execute(CitrixResourceBase.java:1682)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:524)
        at 
com.cloud.hypervisor.xen.resource.XenServer56Resource.executeRequest(XenServer56Resource.java:73)
        at 
com.cloud.hypervisor.xen.resource.XenServer610Resource.executeRequest(XenServer610Resource.java:102)
        at 
com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgentAttache.java:186)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2013-05-09 17:16:04,054 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-91:null) Seq 1-1435828293: Response Received:
2013-05-09 17:16:04,054 DEBUG [agent.transport.Request] (DirectAgent-91:null) 
Seq 1-1435828293: Processing:  { Ans: , MgmtId: 7647994577963, via: 1, Ver: v1, 
Flags: 10, 
[{"Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
 Unable to reset master of slave 10.223.81.62 to 10.223.81.61 due to 
org.apache.xmlrpc.XmlRpcException: Failed to read server's response: connect 
timed out","wait":0}}] }
2013-05-09 17:16:04,055 DEBUG [agent.transport.Request] (AgentTaskPool-7:null) 
Seq 1-1435828293: Received:  { Ans: , MgmtId: 7647994577963, via: 1, Ver: v1, 
Flags: 10, { Answer } }
2013-05-09 17:16:04,055 DEBUG [cloud.ha.AbstractInvestigatorImpl] 
(AgentTaskPool-7:null) host (10.223.81.52) cannot be pinged, returning null ('I 
don't know')
2013-05-09 17:16:04,055 DEBUG [cloud.ha.UserVmDomRInvestigator] 
(AgentTaskPool-7:null) sending ping from (5) to agent's host ip address 
(10.223.81.52)
2013-05-09 17:16:04,057 DEBUG [agent.transport.Request] (AgentTaskPool-7:null) 
Seq 5-2082341068: Sending  { Cmd , MgmtId: 7647994577963, via: 5, Ver: v1, 
Flags: 100011, 
[{"PingTestCommand":{"_computingHostIp":"10.223.81.52","wait":20}}] }
2013-05-09 17:16:04,057 DEBUG [agent.manager.AgentAttache] 
(AgentTaskPool-14:null) Seq 3-1752367195: Waiting some more time because this 
is the current command
2013-05-09 17:16:04,057 DEBUG [agent.transport.Request] (AgentTaskPool-7:null) 
Seq 5-2082341068: Executing:  { Cmd , MgmtId: 7647994577963, via: 5, Ver: v1, 
Flags: 100011, 
[{"PingTestCommand":{"_computingHostIp":"10.223.81.52","wait":20}}] }
2013-05-09 17:16:04,057 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-91:null) Seq 5-2082341068: Executing request
2013-05-09 17:16:05,175 DEBUG [storage.secondary.SecondaryStorageManagerImpl] 
(secstorage-1:null) Zone 1 is ready to launch secondary storage VM
2013-05-09 17:16:05,614 DEBUG [xen.resource.XenServerConnectionPool] 
(DirectAgent-267:null) localLogout has problem Failed to read server's 
response: connect timed out
2013-05-09 17:16:05,614 WARN  [agent.manager.DirectAgentAttache] 
(DirectAgent-267:null) Seq 1-1435828229: Exception Caught while executing 
command
com.cloud.utils.exception.CloudRuntimeException: Unable to reset master of 
slave 10.223.81.62 to 10.223.81.61 due to org.apache.xmlrpc.XmlRpcException: 
Failed to read server's response: connect timed out
        at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool.PoolEmergencyResetMaster(XenServerConnectionPool.java:443)
        at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool.connect(XenServerConnectionPool.java:661)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.getConnection(CitrixResourceBase.java:5639)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.execute(CitrixResourceBase.java:7725)
        at 
com.cloud.hypervisor.xen.resource.CitrixResourceBase.executeRequest(CitrixResourceBase.java:570)
        at 
com.cloud.hypervisor.xen.resource.XenServer56Resource.executeRequest(XenServer56Resource.java:73)
        at 
com.cloud.hypervisor.xen.resource.XenServer610Resource.executeRequest(XenServer610Resource.java:102)
        at 
com.cloud.agent.manager.DirectAgentAttache$Task.run(DirectAgentAttache.java:186)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2013-05-09 17:16:05,615 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-267:null) Seq 1-1435828229: Response Received:
2013-05-09 17:16:05,615 DEBUG [agent.transport.Request] (DirectAgent-267:null) 
Seq 1-1435828229: Processing:  { Ans: , MgmtId: 7647994577963, via: 1, Ver: v1, 
Flags: 10, 
[{"Answer":{"result":false,"details":"com.cloud.utils.exception.CloudRuntimeException:
 Unable to reset master of slave 10.223.81.62 to 10.223.81.61 due to 
org.apache.xmlrpc.XmlRpcException: Failed to read server's response: connect 
timed out","wait":0}}] }
2013-05-09 17:16:05,704 DEB



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to