[
https://issues.apache.org/jira/browse/AMBARI-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ximo Guanter updated AMBARI-4930:
---------------------------------
Attachment: AMBARI-4930-1.patch
It seems like the cluster initialization cannot be run at the same time as the
heartbeat monitoring process. Just moving the line that starts the heartbeat
monitoring process after cluster initialization seems to solve the problem (see
AMBARI-4930-1.patch).
Not that this should be the final fix for this issue, since both the heartbeat
monitoring process and cluster initialization should be able to be run at the
same time, but the attached patch provides a workaround that unblocks this
issue.
My uneducated guess is that the root problem might be in the locking mechanism
of ClusterImpl.java, but I don't have a deep enough understanding of the
different classes and locks pinpoint the root cause.
> Ambari initialization problems after upgrade to 1.4.1
> -----------------------------------------------------
>
> Key: AMBARI-4930
> URL: https://issues.apache.org/jira/browse/AMBARI-4930
> Project: Ambari
> Issue Type: Bug
> Affects Versions: 1.4.1
> Reporter: Ximo Guanter
> Attachments: AMBARI-4930-1.patch
>
>
> Starting the Ambari Server sometime fails with the following error
> {code}
> 04:44:56,972 INFO [main] Configuration:511 - Web App DIR test
> /usr/lib/ambari-server/web
> 04:44:56,975 INFO [main] CertificateManager:70 - Initialization of root
> certificate
> 04:44:56,975 INFO [main] CertificateManager:72 - Certificate exists:true
> 04:44:57,003 INFO [main] AmbariServer:338 - ********* Initializing Clusters
> **********
> 04:44:57,285 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host
> andromeda-compute02.hi.inet
> 04:44:57,295 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host
> andromeda-compute03.hi.inet
> 04:44:57,296 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host
> andromeda-compute06.hi.inet
> 04:44:57,296 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host
> andromeda-compute04.hi.inet
> 04:44:57,297 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host
> andromeda-data99.hi.inet
> 04:44:57,318 ERROR [main] AmbariServer:461 - Failed to run the Ambari Server
> Local Exception Stack:
> Exception [EclipseLink-2004] (Eclipse Persistence Services -
> 2.4.0.v20120608-r11652):
> org.eclipse.persistence.exceptions.ConcurrencyException
> Exception Description: A signal was attempted before wait() on
> ConcurrencyManager. This normally means that an attempt was made to
> commit or rollback a transaction before it was started, or to rollback a
> transaction twice.
> at
> org.eclipse.persistence.exceptions.ConcurrencyException.signalAttemptedBeforeWait(ConcurrencyException.java:84)
> at
> org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseReadLock(ConcurrencyManager.java:489)
> at
> org.eclipse.persistence.internal.identitymaps.CacheKey.releaseReadLock(CacheKey.java:392)
> at
> org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1022)
> at
> org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:933)
> at
> org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(UnitOfWorkIdentityMapAccessor.java:193)
> at
> org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getFromIdentityMap(UnitOfWorkIdentityMapAccessor.java:121)
> at
> org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:3906)
> at
> org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:3861)
> at
> org.eclipse.persistence.mappings.CollectionMapping.buildElementUnitOfWorkClone(CollectionMapping.java:296)
> at
> org.eclipse.persistence.mappings.CollectionMapping.buildElementClone(CollectionMapping.java:309)
> at
> org.eclipse.persistence.internal.queries.ContainerPolicy.addNextValueFromIteratorInto(ContainerPolicy.java:214)
> at
> org.eclipse.persistence.mappings.CollectionMapping.buildCloneForPartObject(CollectionMapping.java:222)
> at
> org.eclipse.persistence.internal.indirection.UnitOfWorkQueryValueHolder.buildCloneFor(UnitOfWorkQueryValueHolder.java:56)
> at
> org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiateImpl(UnitOfWorkValueHolder.java:161)
> at
> org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiate(UnitOfWorkValueHolder.java:222)
> at
> org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:88)
> at
> org.eclipse.persistence.indirection.IndirectList.buildDelegate(IndirectList.java:244)
> at
> org.eclipse.persistence.indirection.IndirectList.getDelegate(IndirectList.java:415)
> at
> org.eclipse.persistence.indirection.IndirectList.isEmpty(IndirectList.java:490)
> at
> org.apache.ambari.server.state.ServiceImpl.<init>(ServiceImpl.java:125)
> at
> org.apache.ambari.server.state.ServiceImpl$$EnhancerByGuice$$807a405e.<init>(<generated>)
> at
> org.apache.ambari.server.state.ServiceImpl$$EnhancerByGuice$$807a405e$$FastClassByGuice$$1c1221ad.newInstance(<generated>)
> at
> com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
> at
> com.google.inject.internal.ProxyFactory$ProxyConstructor.newInstance(ProxyFactory.java:260)
> at
> com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
> at
> com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
> at
> com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
> at
> com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
> at
> com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
> at
> com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
> at $Proxy12.createExisting(Unknown Source)
> at
> org.apache.ambari.server.state.cluster.ClusterImpl.loadServices(ClusterImpl.java:218)
> at
> org.apache.ambari.server.state.cluster.ClusterImpl.debugDump(ClusterImpl.java:808)
> at
> org.apache.ambari.server.state.cluster.ClustersImpl.debugDump(ClustersImpl.java:566)
> at
> org.apache.ambari.server.controller.AmbariServer.run(AmbariServer.java:341)
> at
> org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:458)
> {code}
> The issue seems to be related with the amount of data in the {{ambarirca}}
> database: it reproduces 80-90% of the time we try to start the ambari-server
> on an environment in which that DB is 1GB+ and it basically never reproduces
> on environments with a small DB.
> Running the {{VACUUM FULL}} command does not help minimize the problem.
--
This message was sent by Atlassian JIRA
(v6.2#6252)