Hi Siddarth, I have opened https://issues.apache.org/jira/browse/AMBARI-4930 to track this issue. I cannot provide you with a thread dump because this reproduces in our production environment so we cannot stop or restart the ambari-server without a maintenance period. I will try to reproduce in another environment to get the thread dump and I will attach it to the JIRA.
Thanks! Ximo De: Siddharth Wagle <[email protected]<mailto:[email protected]>> Responder a: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Fecha: viernes, 28 de febrero de 2014 19:05 Para: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Asunto: Re: Initializaton errors on Ambari 1.4.1 Hi Ximo, Could you please provide the thread dump when you notice this? (jstack command output) I would be glad to open a Jira for tracking this issue unless you want to open it that's fine too. The tables to look at for cleaning up the postgres db are, execution_command and host_role_command which correspond to the requests and tasks. You could just delete the byte array fields in these tables and reclaim disk space by using the VACUUM command. Best Regards, Sid On Thu, Feb 27, 2014 at 10:06 PM, JOAQUIN GUANTER GONZALBEZ <[email protected]<mailto:[email protected]>> wrote: Hello, Since we upgraded to Ambari 1.4.1, we see the following initialization error from time to time when trying to start ambari-server: 04:44:56,972 INFO [main] Configuration:511 - Web App DIR test /usr/lib/ambari-server/web 04:44:56,975 INFO [main] CertificateManager:70 - Initialization of root certificate 04:44:56,975 INFO [main] CertificateManager:72 - Certificate exists:true 04:44:57,003 INFO [main] AmbariServer:338 - ********* Initializing Clusters ********** 04:44:57,285 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute02.hi.inet 04:44:57,295 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute03.hi.inet 04:44:57,296 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute06.hi.inet 04:44:57,296 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute04.hi.inet 04:44:57,297 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-data99.hi.inet 04:44:57,318 ERROR [main] AmbariServer:461 - Failed to run the Ambari Server Local Exception Stack: Exception [EclipseLink-2004] (Eclipse Persistence Services - 2.4.0.v20120608-r11652): org.eclipse.persistence.exceptions.ConcurrencyException Exception Description: A signal was attempted before wait() on ConcurrencyManager. This normally means that an attempt was made to commit or rollback a transaction before it was started, or to rollback a transaction twice. at org.eclipse.persistence.exceptions.ConcurrencyException.signalAttemptedBeforeWait(ConcurrencyException.java:84) at org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseReadLock(ConcurrencyManager.java:489) at org.eclipse.persistence.internal.identitymaps.CacheKey.releaseReadLock(CacheKey.java:392) at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1022) at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:933) at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(UnitOfWorkIdentityMapAccessor.java:193) at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getFromIdentityMap(UnitOfWorkIdentityMapAccessor.java:121) at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:3906) at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:3861) at org.eclipse.persistence.mappings.CollectionMapping.buildElementUnitOfWorkClone(CollectionMapping.java:296) at org.eclipse.persistence.mappings.CollectionMapping.buildElementClone(CollectionMapping.java:309) at org.eclipse.persistence.internal.queries.ContainerPolicy.addNextValueFromIteratorInto(ContainerPolicy.java:214) at org.eclipse.persistence.mappings.CollectionMapping.buildCloneForPartObject(CollectionMapping.java:222) at org.eclipse.persistence.internal.indirection.UnitOfWorkQueryValueHolder.buildCloneFor(UnitOfWorkQueryValueHolder.java:56) at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiateImpl(UnitOfWorkValueHolder.java:161) at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiate(UnitOfWorkValueHolder.java:222) at org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:88) at org.eclipse.persistence.indirection.IndirectList.buildDelegate(IndirectList.java:244) at org.eclipse.persistence.indirection.IndirectList.getDelegate(IndirectList.java:415) at org.eclipse.persistence.indirection.IndirectList.isEmpty(IndirectList.java:490) at org.apache.ambari.server.state.ServiceImpl.<init>(ServiceImpl.java:125) at org.apache.ambari.server.state.ServiceImpl$$EnhancerByGuice$$807a405e.<init>(<generated>) at org.apache.ambari.server.state.ServiceImpl$$EnhancerByGuice$$807a405e$$FastClassByGuice$$1c1221ad.newInstance(<generated>) at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40) at com.google.inject.internal.ProxyFactory$ProxyConstructor.newInstance(ProxyFactory.java:260) at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85) at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254) at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978) at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024) at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974) at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632) at $Proxy12.createExisting(Unknown Source) at org.apache.ambari.server.state.cluster.ClusterImpl.loadServices(ClusterImpl.java:218) at org.apache.ambari.server.state.cluster.ClusterImpl.debugDump(ClusterImpl.java:808) at org.apache.ambari.server.state.cluster.ClustersImpl.debugDump(ClustersImpl.java:566) at org.apache.ambari.server.controller.AmbariServer.run(AmbariServer.java:341) at org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:458) Is this a known issue? It seems to be related with the amount of data in the PostgreSQL DB. In one of our environments, the PSQL DB dump’s size is around 1 GB and we are having serious problems to launch ambari-server (around 60-70% of the “ambari-server start” commands cause the above exception). Thanks, Ximo. ________________________________ Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. ________________________________ Este mensaje se dirige exclusivamente a su destinatario. Puede consultar nuestra política de envío y recepción de correo electrónico en el enlace situado más abajo. This message is intended exclusively for its addressee. We only send and receive email on the basis of the terms set out at: http://www.tid.es/ES/PAGINAS/disclaimer.aspx
