Suraj Naik created AMBARI-25613:
-----------------------------------
Summary: Concurrent Host Modification exception while sending
INSTALL/START Host request
Key: AMBARI-25613
URL: https://issues.apache.org/jira/browse/AMBARI-25613
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.7.6
Reporter: Suraj Naik
java.lang.RuntimeException: START Host request submission failed:
java.lang.RuntimeException: Update Host request submission failed:
java.util.ConcurrentModificationException
at
org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:497)
at
org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
at
org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
at
org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
at
org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Update Host request submission failed:
java.util.ConcurrentModificationException
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:865)
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:852)
at
org.apache.ambari.server.controller.internal.AbstractResourceProvider.invokeWithRetry(AbstractResourceProvider.java:465)
at
org.apache.ambari.server.controller.internal.AbstractResourceProvider.modifyResources(AbstractResourceProvider.java:346)
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider.doUpdateResources(HostComponentResourceProvider.java:852)
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider.start(HostComponentResourceProvider.java:492)
at
org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:494)
at
org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
at
org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
at
org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
at
org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.ConcurrentModificationException: NA
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
at java.util.HashMap$EntryIterator.next(HashMap.java:1479)
at java.util.HashMap$EntryIterator.next(HashMap.java:1477)
at java.util.HashMap.putMapEntries(HashMap.java:512)
at java.util.HashMap.<init>(HashMap.java:490)
at
org.apache.ambari.server.topology.HostRequest.getPhysicalTaskMapping(HostRequest.java:458)
at
org.apache.ambari.server.topology.LogicalRequest.getStageSummaries(LogicalRequest.java:286)
at
org.apache.ambari.server.topology.TopologyManager.getPendingHostComponents(TopologyManager.java:823)
at
org.apache.ambari.server.utils.StageUtils.getClusterHostInfo(StageUtils.java:306)
at
org.apache.ambari.server.controller.AmbariManagementControllerImpl.doStageCreation(AmbariManagementControllerImpl.java:2788)
at
org.apache.ambari.server.controller.AmbariManagementControllerImpl.addStages(AmbariManagementControllerImpl.java:3513)
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider.updateHostComponents(HostComponentResourceProvider.java:707)
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:857)
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider$4.invoke(HostComponentResourceProvider.java:852)
at
org.apache.ambari.server.controller.internal.AbstractResourceProvider.invokeWithRetry(AbstractResourceProvider.java:465)
at
org.apache.ambari.server.controller.internal.AbstractResourceProvider.modifyResources(AbstractResourceProvider.java:346)
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider.doUpdateResources(HostComponentResourceProvider.java:852)
at
org.apache.ambari.server.controller.internal.HostComponentResourceProvider.start(HostComponentResourceProvider.java:492)
at
org.apache.ambari.server.topology.AmbariContext.startHost(AmbariContext.java:494)
at
org.apache.ambari.server.topology.ClusterTopologyImpl.startHost(ClusterTopologyImpl.java:268)
at
org.apache.ambari.server.topology.tasks.StartHostTask.runTask(StartHostTask.java:51)
at
org.apache.ambari.server.topology.tasks.TopologyHostTask.run(TopologyHostTask.java:55)
at
org.apache.ambari.server.topology.HostOfferResponse$1.run(HostOfferResponse.java:85)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
My teammate [~ramkrishna] did some analysis on this one by adding logs and
latches and found that the installation and registration though done parallely
each thread tries to get the entire cluster’s view of the current physical
tasks. So it is bound to happen that when a registration is happening the other
thread can do a getPhysicalTaskMapping(). (leading to CME)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)