yeah that looks like an issue. DeviceInitializationUtils is doing a blocking get on a Future which is usually not a good thing. And it occurred via an EOS data change and is blocking an akka Dispatcher thread.
On a side note, there's a lot of threads with io.netty.util.concurrent.FastThreadLocalThread - not sure if that's normal. On Mon, May 29, 2017 at 9:30 AM, Michael Vorburger <vorbur...@redhat.com> wrote: > +openflowplugin-dev & +ovsdb-dev: > > Tom, > > On Mon, May 29, 2017 at 2:57 PM, Tom Pantelis <tompante...@gmail.com> > wrote: > > Thanks a lot for replying, really appreciate it! > > It looks like the Dispatcher was for data change notifications. I suspect >> a listener was hung or responding slowly so the actor's mailbox filled up >> with change notifications. I would suggest getting a thread dump next time. >> > > Turn out no need to wait for next time - just figured out that we can > obtain thread dumps à posteriori from an HPROF using MAT... see the [4] > Bug7370_Threads.zip HTML report just attached to Bug 7370. > > It shows 604 threads (a lot?), many of which are e.g. parked ForkJoinPool, > and a number of them related to ovsdb and openflowplugin stuff... so what > are we looking for, in this thread dump? I haven't looked thread each > thread's stack yet, but this one vaguely looks like what you may mean by "a > listener was hung or responding slowly" (causing "the actor's mailbox > filled upwith change notifications"), could it possibly be the reason for / > having something to do with this OOM: > > opendaylight-cluster-data-akka.actor.default-dispatcher-16 > at sun.misc.Unsafe.park(ZJ)V (Native Method) > at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V > (LockSupport.java:175) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()Z > (AbstractQueuedSynchronizer.java:836) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(I)V > (AbstractQueuedSynchronizer.java:997) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(I)V > (AbstractQueuedSynchronizer.java:1304) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get()Ljava/lang/Object; > (AbstractFuture.java:285) > at com.google.common.util.concurrent.AbstractFuture.get()Ljava/lang/Object; > (AbstractFuture.java:116) > at > org.opendaylight.openflowplugin.impl.util.DeviceInitializationUtils.initializeNodeInformation(Lorg/opendaylight/openflowplugin/api/openflow/device/DeviceContext;ZLorg/opendaylight/openflowplugin/openflow/md/core/sal/convertor/ConvertorExecutor;)V > (DeviceInitializationUtils.java:155) > at > org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.onContextInstantiateService(Lorg/opendaylight/openflowplugin/api/openflow/connection/ConnectionContext;)Z > (DeviceContextImpl.java:730) > at > org.opendaylight.openflowplugin.impl.lifecycle.LifecycleServiceImpl.instantiateServiceInstance()V > (LifecycleServiceImpl.java:53) > at > org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceRegistrationDelegator.instantiateServiceInstance()V > (ClusterSingletonServiceRegistrationDelegator.java:46) > at > org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.takeOwnership()V > (ClusterSingletonServiceGroupImpl.java:291) > at > org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/common/api/GenericEntityOwnershipChange;)V > (ClusterSingletonServiceGroupImpl.java:237) > at > org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/common/api/GenericEntityOwnershipChange;)V > (AbstractClusterSingletonServiceProviderImpl.java:145) > at > org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/dom/api/DOMEntityOwnershipChange;)V > (DOMClusterSingletonServiceProviderImpl.java:23) > at > org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(Lorg/opendaylight/mdsal/eos/dom/api/DOMEntityOwnershipChange;)V > (EntityOwnershipListenerActor.java:46) > at > org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.handleReceive(Ljava/lang/Object;)V > (EntityOwnershipListenerActor.java:36) > at > org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(Ljava/lang/Object;)V > (AbstractUntypedActor.java:26) > at > akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(Ljava/lang/Object;Lscala/Function1;)Ljava/lang/Object; > (UntypedActor.scala:165) > at > akka.actor.Actor$class.aroundReceive(Lakka/actor/Actor;Lscala/PartialFunction;Ljava/lang/Object;)V > (Actor.scala:484) > at > akka.actor.UntypedActor.aroundReceive(Lscala/PartialFunction;Ljava/lang/Object;)V > (UntypedActor.scala:95) > at akka.actor.ActorCell.receiveMessage(Ljava/lang/Object;)V > (ActorCell.scala:526) > at akka.actor.ActorCell.invoke(Lakka/dispatch/Envelope;)V > (ActorCell.scala:495) > at akka.dispatch.Mailbox.processMailbox(IJ)V (Mailbox.scala:257) > at akka.dispatch.Mailbox.run()V (Mailbox.scala:224) > at akka.dispatch.Mailbox.exec()Z (Mailbox.scala:234) > at scala.concurrent.forkjoin.ForkJoinTask.doExec()I (ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(Lscala/concurrent/forkjoin/ForkJoinTask;)V > (ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(Lscala/concurrent/forkjoin/ForkJoinPool$WorkQueue;)V > (ForkJoinPool.java:1979) > at scala.concurrent.forkjoin.ForkJoinWorkerThread.run()V > (ForkJoinWorkerThread.java:107) > > > > > > >> On Mon, May 29, 2017 at 7:52 AM, Michael Vorburger <vorbur...@redhat.com> >> wrote: >> >>> Hi guys, >>> >>> I just ran MAT([1]) over an HPROF heap dump on OOM in Bug 7370, and it >>> (MAT) raises a "leak suspect" in akka.dispatch.Dispatcher - see the [3] >>> java_pid19570_Leak_Suspects.zip just attached to Bug 7370 ... questions: >>> >>> Is this perhaps something you jump at with an "ah that, we know about it >>> and already fixed that in ..." ? >>> >>> If not, how do we go about better understanding the root cause of this, >>> and be able to eventually fix this? >>> >>> My underlying assumption here is that isn't "normal" and not just "by >>> design" - if it is, I'd love some education... like I'm hoping that the >>> conclusion here isn't simply that MD SAL's data store is a dumb in-memory >>> data base which basically just takes a huge amount of GBs to keep (all) >>> YANG model instances on the heap - or is it? >>> >>> Tx, >>> M. >>> >>> [1] https://www.eclipse.org/mat/ >>> >>> [2] https://bugs.opendaylight.org/show_bug.cgi?id=7370 >>> >>> [3] https://bugs.opendaylight.org/attachment.cgi?id=1816 >>> >> > [4] https://bugs.opendaylight.org/attachment.cgi?id=1819 > > >> -- >>> Michael Vorburger, Red Hat >>> vorbur...@redhat.com | IRC: vorburger @freenode | ~ = >>> http://vorburger.ch >>> >>> _______________________________________________ >>> mdsal-dev mailing list >>> mdsal-...@lists.opendaylight.org >>> https://lists.opendaylight.org/mailman/listinfo/mdsal-dev >>> >>> >> >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev