[ 
https://issues.apache.org/jira/browse/AMBARI-19929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880654#comment-15880654
 ] 

Sebastian Toader commented on AMBARI-19929:
-------------------------------------------

PRB: The TopologyRequest/TopologyLogicalRequest/TopologyHostRequest records are 
not written to the database within a transaction thus this may lead to 
inconsistencies. These database records are needed during an upscale request 
which will not complete properly due to inconsistencies resulting in the added 
host not being registered properly. The side effect is hosts going into 
heartbeat lost state. The same can happen if ambari server is restarted in the 
middle of a cluster provisioning.

To fix this the TopologyRequest/TopologyLogicalRequest/TopologyHostRequest 
records needs to be written to the database witihin a transaction.

> TopologyRequest/TopologyLogicalRequest/TopologyHostRequest database 
> inconsistency
> ---------------------------------------------------------------------------------
>
>                 Key: AMBARI-19929
>                 URL: https://issues.apache.org/jira/browse/AMBARI-19929
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.2.1
>            Reporter: amarnathreddy
>            Assignee: Sebastian Toader
>            Priority: Critical
>
> If there is any inconsistency with topology_logical_request, topology_request 
> tables then all heart beat request fails with below exception
> 10 Jan 2017 10:43:12,004  WARN [qtp-ambari-agent-137] ServletHandler:563 - 
> /agent/v1/register/agent540.xxxx.com
> java.lang.NullPointerException
> some of the entries for topology_request does not have reference entry in 
> topology_logical_request table.
> because of that all healthy agents are marked as heart beat lost.
> This should not happen, rather it should print the exception and process the 
> heart beats.
> full stack trace:
> java.lang.NullPointerException
>         at 
> org.apache.ambari.server.topology.PersistedStateImpl.getAllRequests(PersistedStateImpl.java:157)
>         at 
> org.apache.ambari.server.topology.TopologyManager.ensureInitialized(TopologyManager.java:131)
>         at 
> org.apache.ambari.server.topology.TopologyManager.onHostRegistered(TopologyManager.java:315)
>         at 
> org.apache.ambari.server.state.host.HostImpl$HostRegistrationReceived.transition(HostImpl.java:301)
>         at 
> org.apache.ambari.server.state.host.HostImpl$HostRegistrationReceived.transition(HostImpl.java:266)
>         at 
> org.apache.ambari.server.state.fsm.StateMachineFactory$SingleInternalArc.doTransition(StateMachineFactory.java:354)
>         at 
> org.apache.ambari.server.state.fsm.StateMachineFactory.doTransition(StateMachineFactory.java:294)
>         at 
> org.apache.ambari.server.state.fsm.StateMachineFactory.access$300(StateMachineFactory.java:39)
>         at 
> org.apache.ambari.server.state.fsm.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:440)
>         at 
> org.apache.ambari.server.state.host.HostImpl.handleEvent(HostImpl.java:570)
>         at 
> org.apache.ambari.server.agent.HeartBeatHandler.handleRegistration(HeartBeatHandler.java:966)
>         at 
> org.apache.ambari.server.agent.rest.AgentResource.register(AgentResource.java:95)
>         at sun.reflect.GeneratedMethodAccessor161.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:497)
>         at 
> com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
>         at 
> com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
>         at 
> com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
>         at 
> com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:302)
>         at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at 
> com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
>         at 
> com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
>         at 
> com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1542)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1473)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1419)
>         at 
> com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1409)
> BUSINESS IMPACT: Production cluster cannot be managed through Ambari



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to