I remember when network partition happens, pool slave may enter emergency mode 
and show offline as it could not reach its master for a long time.
Could you check hv1's console (graphical console, not ssh console), and check 
if its nics are shown correctly?

Regards
Mice

-----Original Message-----
From: Nik Martin [mailto:nik.mar...@nfinausa.com]
Sent: 2012-8-10 (ζ˜ŸζœŸδΊ”) 23:04
To: cloudstack-users@incubator.apache.org
Subject: Xen Host failure in pool
 
We have a Xenserver 6.2 based pool of three hosts running under 
CloudStack Acton release (code base is about two weeks old).  We left 
last night and everything was fine, and I have about 2 VMs running on 
each host, not doing anything. This morning, I came in, and three VMs 
have stopped, and I logged into XenCenter to see what the pool looked 
like, and the Pol master hd changed from host HV3 to HV2, and HV1 was 
offline.  I logged in to HV1's console, and looked at the 
/var/log/messages, and it was complaining about the pool master address 
being wrong. I went into CloudStack UI and deleted and re-added the 
host, and it failed immediately, and I got this in the log when I did:


2012-08-10 09:56:39,566 DEBUG [cloud.api.ApiServlet] 
(catalina-exec-24:null) Invalid paramemter in URL found. param: hosttags=
2012-08-10 09:56:39,573 INFO  [cloud.resource.ResourceManagerImpl] 
(catalina-exec-24:null) Trying to add a new host at http://172.16.5.3 in 
data center 2
2012-08-10 09:56:39,629 DEBUG [xen.resource.XenServerConnectionPool] 
(catalina-exec-24:null) Slave logon to 172.16.5.3
2012-08-10 09:56:39,632 DEBUG [xen.resource.XenServerConnectionPool] 
(catalina-exec-24:null) Failed to slave local login to 172.16.5.3 due to 
The master says the host is not known to it. Perhaps the Host was 
deleted from the master's database? Perhaps the slave is pointing to the 
wrong master?
2012-08-10 09:56:39,638 DEBUG [xen.discoverer.XcpServerDiscoverer] 
(catalina-exec-24:null) other exceptions: java.lang.RuntimeException: 
can not get master ip
java.lang.RuntimeException: can not get master ip
        at 
com.cloud.hypervisor.xen.resource.XenServerConnectionPool.getMasterIp(XenServerConnectionPool.java:343)
        at 
com.cloud.hypervisor.xen.discoverer.XcpServerDiscoverer.find(XcpServerDiscoverer.java:179)
        at 
com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManagerImpl.java:644)
        at 
com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImpl.java:514)
        at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132)
        at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509)
        at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416)
        at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300)
        at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
        at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
        at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:721)
        at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2268)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2012-08-10 09:56:39,638 WARN  [cloud.resource.ResourceManagerImpl] 
(catalina-exec-24:null) Unable to find the server resources at 
http://172.16.5.3
2012-08-10 09:56:39,642 WARN  [api.commands.AddHostCmd] 
(catalina-exec-24:null) Exception:
com.cloud.exception.DiscoveryException: Unable to add the host
        at 
com.cloud.resource.ResourceManagerImpl.discoverHostsFull(ResourceManagerImpl.java:694)
        at 
com.cloud.resource.ResourceManagerImpl.discoverHosts(ResourceManagerImpl.java:514)
        at com.cloud.api.commands.AddHostCmd.execute(AddHostCmd.java:136)
        at com.cloud.api.ApiDispatcher.dispatch(ApiDispatcher.java:132)
        at com.cloud.api.ApiServer.queueCommand(ApiServer.java:509)
        at com.cloud.api.ApiServer.handleRequest(ApiServer.java:416)
        at com.cloud.api.ApiServlet.processRequest(ApiServlet.java:300)
        at com.cloud.api.ApiServlet.doGet(ApiServlet.java:59)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
        at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
        at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
        at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
        at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
        at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
        at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
        at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
        at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
        at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
        at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
        at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:721)
        at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2268)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:679)
2012-08-10 09:56:39,642 WARN  [cloud.api.ApiDispatcher] 
(catalina-exec-24:null) class com.cloud.api.ServerApiException : Unable 
to add the host
2012-08-10 09:56:39,723 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-305:null) Ping from 17
2012-08-10 09:56:43,822 DEBUG 
[storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Zone 
2 is ready to launch secondary storage VM
2012-08-10 09:56:43,916 DEBUG 
[cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Zone 
2 is ready to launch console proxy
2012-08-10 09:56:44,102 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] 
(RouterStatusMonitor-1:null) Found 2 routers.
2012-08-10 09:56:44,614 DEBUG [agent.manager.AgentManagerImpl] 
(AgentManager-Handler-12:null) Ping from 22
2012-08-10 09:56:48,864 DEBUG [agent.manager.AgentManagerImpl] 
(AgentManager-Handler-10:null) Ping from 18
2012-08-10 09:56:49,511 DEBUG [cloud.server.StatsCollector] 
(StatsCollector-1:null) VmStatsCollector is running...
2012-08-10 09:56:49,525 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-305:null) Seq 16-92408948: Executing request
2012-08-10 09:56:49,763 DEBUG [xen.resource.CitrixResourceBase] 
(DirectAgent-305:null) Vm cpu utilization 0.01
2012-08-10 09:56:49,763 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-305:null) Seq 16-92408948: Response Received:
2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(DirectAgent-305:null) Cleanup succeeded. Details null
2012-08-10 09:56:49,763 DEBUG [agent.transport.Request] 
(StatsCollector-1:null) Seq 16-92408948: Received:  { Ans: , MgmtId: 
130577622632, via: 16, Ver: v1, Flags: 10, { GetVmStatsAnswer } }
2012-08-10 09:56:49,763 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(StatsCollector-1:null) Cleanup succeeded. Details null
2012-08-10 09:56:54,411 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-497:null) Ping from 17
2012-08-10 09:56:54,550 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-338:null) Ping from 16
2012-08-10 09:56:59,614 DEBUG [agent.manager.AgentManagerImpl] 
(AgentManager-Handler-8:null) Ping from 22
2012-08-10 09:57:03,864 DEBUG [agent.manager.AgentManagerImpl] 
(AgentManager-Handler-9:null) Ping from 18
2012-08-10 09:57:09,551 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-71:null) Ping from 16
2012-08-10 09:57:09,669 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-338:null) Ping from 17
2012-08-10 09:57:13,821 DEBUG 
[storage.secondary.SecondaryStorageManagerImpl] (secstorage-1:null) Zone 
2 is ready to launch secondary storage VM
2012-08-10 09:57:13,918 DEBUG 
[cloud.consoleproxy.ConsoleProxyManagerImpl] (consoleproxy-1:null) Zone 
2 is ready to launch console proxy
2012-08-10 09:57:14,102 DEBUG 
[network.router.VirtualNetworkApplianceManagerImpl] 
(RouterStatusMonitor-1:null) Found 2 routers.
2012-08-10 09:57:14,614 DEBUG [agent.manager.AgentManagerImpl] 
(AgentManager-Handler-11:null) Ping from 22
2012-08-10 09:57:15,645 DEBUG [cloud.server.StatsCollector] 
(StatsCollector-3:null) HostStatsCollector is running...
2012-08-10 09:57:15,656 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-71:null) Seq 16-92408949: Executing request
2012-08-10 09:57:15,878 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-71:null) Seq 16-92408949: Response Received:
2012-08-10 09:57:15,878 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(DirectAgent-71:null) Cleanup succeeded. Details null
2012-08-10 09:57:15,878 DEBUG [agent.transport.Request] 
(StatsCollector-3:null) Seq 16-92408949: Received:  { Ans: , MgmtId: 
130577622632, via: 16, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
2012-08-10 09:57:15,879 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(StatsCollector-3:null) Cleanup succeeded. Details null
2012-08-10 09:57:15,884 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-338:null) Seq 17-665190891: Executing request
2012-08-10 09:57:16,312 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-338:null) Seq 17-665190891: Response Received:
2012-08-10 09:57:16,312 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(DirectAgent-338:null) Cleanup succeeded. Details null
2012-08-10 09:57:16,312 DEBUG [agent.transport.Request] 
(StatsCollector-3:null) Seq 17-665190891: Received:  { Ans: , MgmtId: 
130577622632, via: 17, Ver: v1, Flags: 10, { GetHostStatsAnswer } }
2012-08-10 09:57:16,313 DEBUG [cloud.vm.VirtualMachineManagerImpl] 
(StatsCollector-3:null) Cleanup succeeded. Details null
2012-08-10 09:57:18,864 DEBUG [agent.manager.AgentManagerImpl] 
(AgentManager-Handler-15:null) Ping from 18
2012-08-10 09:57:24,407 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-71:null) Ping from 17
2012-08-10 09:57:24,566 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-338:null) Ping from 16
2012-08-10 09:57:29,615 DEBUG [agent.manager.AgentManagerImpl] 
(AgentManager-Handler-1:null) Ping from 22
2012-08-10 09:57:30,047 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-294:null) Seq 16-92405762: Executing request
2012-08-10 09:57:30,308 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-294:null) Seq 16-92405762: Response Received:
2012-08-10 09:57:30,308 DEBUG [agent.transport.Request] 
(DirectAgent-294:null) Seq 16-92405762: Processing:  { Ans: , MgmtId: 
130577622632, via: 16, Ver: v1, Flags: 10, 
[{"ClusterSyncAnswer":{"_clusterId":1,"_newStates":{},"_isExecuted":false,"result":true,"wait":0}}]
 
}
2012-08-10 09:57:31,060 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-357:null) Seq 17-665190402: Executing request
2012-08-10 09:57:31,250 DEBUG [agent.manager.DirectAgentAttache] 
(DirectAgent-357:null) Seq 17-665190402: Response Received:
2012-08-10 09:57:31,250 DEBUG [agent.transport.Request] 
(DirectAgent-357:null) Seq 17-665190402: Processing:  { Ans: , MgmtId: 
130577622632, via: 17, Ver: v1, Flags: 10, 
[{"Answer":{"result":true,"wait":0}}] }

This is a very serious error, and I don't know how to fix it.  Can 
anyone suggest what might be the problem and hos I might fix it?


-- 
Regards,

Nik

Nik Martin
VP Business Development
Nfina Technologies, Inc.
+1.251.243.0043 x1003
Relentless Reliability


Reply via email to