[
https://issues.apache.org/jira/browse/AMBARI-19929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880654#comment-15880654
]
Sebastian Toader commented on AMBARI-19929:
-------------------------------------------
PRB: The TopologyRequest/TopologyLogicalRequest/TopologyHostRequest records are
not written to the database within a transaction thus this may lead to
inconsistencies. These database records are needed during an upscale request
which will not complete properly due to inconsistencies resulting in the added
host not being registered properly. The side effect is hosts going into
heartbeat lost state. The same can happen if ambari server is restarted in the
middle of a cluster provisioning.
To fix this the TopologyRequest/TopologyLogicalRequest/TopologyHostRequest
records needs to be written to the database witihin a transaction.
> TopologyRequest/TopologyLogicalRequest/TopologyHostRequest database
> inconsistency
> ---------------------------------------------------------------------------------
>
> Key: AMBARI-19929
> URL: https://issues.apache.org/jira/browse/AMBARI-19929
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.2.1
> Reporter: amarnathreddy
> Assignee: Sebastian Toader
> Priority: Critical
>
> If there is any inconsistency with topology_logical_request, topology_request
> tables then all heart beat request fails with below exception
> 10 Jan 2017 10:43:12,004 WARN [qtp-ambari-agent-137] ServletHandler:563 -
> /agent/v1/register/agent540.xxxx.com
> java.lang.NullPointerException
> some of the entries for topology_request does not have reference entry in
> topology_logical_request table.
> because of that all healthy agents are marked as heart beat lost.
> This should not happen, rather it should print the exception and process the
> heart beats.
> full stack trace:
> java.lang.NullPointerException
> at
> org.apache.ambari.server.topology.PersistedStateImpl.getAllRequests(PersistedStateImpl.java:157)
> at
> org.apache.ambari.server.topology.TopologyManager.ensureInitialized(TopologyManager.java:131)
> at
> org.apache.ambari.server.topology.TopologyManager.onHostRegistered(TopologyManager.java:315)
> at
> org.apache.ambari.server.state.host.HostImpl$HostRegistrationReceived.transition(HostImpl.java:301)
> at
> org.apache.ambari.server.state.host.HostImpl$HostRegistrationReceived.transition(HostImpl.java:266)
> at
> org.apache.ambari.server.state.fsm.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:354)
> at
> org.apache.ambari.server.state.fsm.StateMachineFactory.doTransition(StateMachineFactory.java:294)
> at
> org.apache.ambari.server.state.fsm.StateMachineFactory.access$300(StateMachineFactory.java:39)
> at
> org.apache.ambari.server.state.fsm.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:440)
> at
> org.apache.ambari.server.state.host.HostImpl.handleEvent(HostImpl.java:570)
> at
> org.apache.ambari.server.agent.HeartBeatHandler.handleRegistration(HeartBeatHandler.java:966)
> at
> org.apache.ambari.server.agent.rest.AgentResource.register(AgentResource.java:95)
> at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
> at
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
> at
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
> at
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
> at
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
> at
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
> at
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
> at
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
> at
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
> at
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
> at
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
> BUSINESS IMPACT: Production cluster cannot be managed through Ambari
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)