yeah that looks like an issue. DeviceInitializationUtils is doing a
blocking get on a Future which is usually not a good thing. And it
occurred via an EOS data change and is blocking an akka Dispatcher
thread.

On a side note, there's a lot of threads with
io.netty.util.concurrent.FastThreadLocalThread - not sure if that's
normal.



On Mon, May 29, 2017 at 9:30 AM, Michael Vorburger <vorbur...@redhat.com>
wrote:

> +openflowplugin-dev & +ovsdb-dev:
>
> Tom,
>
> On Mon, May 29, 2017 at 2:57 PM, Tom Pantelis <tompante...@gmail.com>
> wrote:
>
> Thanks a lot for replying, really appreciate it!
>
> It looks like the Dispatcher was for data change notifications. I suspect
>> a listener was hung or responding slowly so the actor's mailbox filled up
>> with change notifications. I would suggest getting a thread dump next time.
>>
>
> Turn out no need to wait for next time - just figured out that we can
> obtain thread dumps à posteriori from an HPROF using MAT... see the [4]
> Bug7370_Threads.zip HTML report just attached to Bug 7370.
>
> It shows 604 threads (a lot?), many of which are e.g. parked ForkJoinPool,
> and a number of them related to ovsdb and openflowplugin stuff... so what
> are we looking for, in this thread dump? I haven't looked thread each
> thread's stack yet, but this one vaguely looks like what you may mean by "a
> listener was hung or responding slowly" (causing "the actor's mailbox
> filled upwith change notifications"), could it possibly be the reason for /
> having something to do with this OOM:
>
> opendaylight-cluster-data-akka.actor.default-dispatcher-16
>   at sun.misc.Unsafe.park(ZJ)V (Native Method)
>   at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V 
> (LockSupport.java:175)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()Z
>  (AbstractQueuedSynchronizer.java:836)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(I)V
>  (AbstractQueuedSynchronizer.java:997)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(I)V
>  (AbstractQueuedSynchronizer.java:1304)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get()Ljava/lang/Object; 
> (AbstractFuture.java:285)
>   at com.google.common.util.concurrent.AbstractFuture.get()Ljava/lang/Object; 
> (AbstractFuture.java:116)
>   at 
> org.opendaylight.openflowplugin.impl.util.DeviceInitializationUtils.initializeNodeInformation(Lorg/opendaylight/openflowplugin/api/openflow/device/DeviceContext;ZLorg/opendaylight/openflowplugin/openflow/md/core/sal/convertor/ConvertorExecutor;)V
>  (DeviceInitializationUtils.java:155)
>   at 
> org.opendaylight.openflowplugin.impl.device.DeviceContextImpl.onContextInstantiateService(Lorg/opendaylight/openflowplugin/api/openflow/connection/ConnectionContext;)Z
>  (DeviceContextImpl.java:730)
>   at 
> org.opendaylight.openflowplugin.impl.lifecycle.LifecycleServiceImpl.instantiateServiceInstance()V
>  (LifecycleServiceImpl.java:53)
>   at 
> org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceRegistrationDelegator.instantiateServiceInstance()V
>  (ClusterSingletonServiceRegistrationDelegator.java:46)
>   at 
> org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.takeOwnership()V
>  (ClusterSingletonServiceGroupImpl.java:291)
>   at 
> org.opendaylight.mdsal.singleton.dom.impl.ClusterSingletonServiceGroupImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/common/api/GenericEntityOwnershipChange;)V
>  (ClusterSingletonServiceGroupImpl.java:237)
>   at 
> org.opendaylight.mdsal.singleton.dom.impl.AbstractClusterSingletonServiceProviderImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/common/api/GenericEntityOwnershipChange;)V
>  (AbstractClusterSingletonServiceProviderImpl.java:145)
>   at 
> org.opendaylight.mdsal.singleton.dom.impl.DOMClusterSingletonServiceProviderImpl.ownershipChanged(Lorg/opendaylight/mdsal/eos/dom/api/DOMEntityOwnershipChange;)V
>  (DOMClusterSingletonServiceProviderImpl.java:23)
>   at 
> org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.onEntityOwnershipChanged(Lorg/opendaylight/mdsal/eos/dom/api/DOMEntityOwnershipChange;)V
>  (EntityOwnershipListenerActor.java:46)
>   at 
> org.opendaylight.controller.cluster.datastore.entityownership.EntityOwnershipListenerActor.handleReceive(Ljava/lang/Object;)V
>  (EntityOwnershipListenerActor.java:36)
>   at 
> org.opendaylight.controller.cluster.common.actor.AbstractUntypedActor.onReceive(Ljava/lang/Object;)V
>  (AbstractUntypedActor.java:26)
>   at 
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(Ljava/lang/Object;Lscala/Function1;)Ljava/lang/Object;
>  (UntypedActor.scala:165)
>   at 
> akka.actor.Actor$class.aroundReceive(Lakka/actor/Actor;Lscala/PartialFunction;Ljava/lang/Object;)V
>  (Actor.scala:484)
>   at 
> akka.actor.UntypedActor.aroundReceive(Lscala/PartialFunction;Ljava/lang/Object;)V
>  (UntypedActor.scala:95)
>   at akka.actor.ActorCell.receiveMessage(Ljava/lang/Object;)V 
> (ActorCell.scala:526)
>   at akka.actor.ActorCell.invoke(Lakka/dispatch/Envelope;)V 
> (ActorCell.scala:495)
>   at akka.dispatch.Mailbox.processMailbox(IJ)V (Mailbox.scala:257)
>   at akka.dispatch.Mailbox.run()V (Mailbox.scala:224)
>   at akka.dispatch.Mailbox.exec()Z (Mailbox.scala:234)
>   at scala.concurrent.forkjoin.ForkJoinTask.doExec()I (ForkJoinTask.java:260)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(Lscala/concurrent/forkjoin/ForkJoinTask;)V
>  (ForkJoinPool.java:1339)
>   at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(Lscala/concurrent/forkjoin/ForkJoinPool$WorkQueue;)V
>  (ForkJoinPool.java:1979)
>   at scala.concurrent.forkjoin.ForkJoinWorkerThread.run()V 
> (ForkJoinWorkerThread.java:107)
>
>
>
>
>
>
>> On Mon, May 29, 2017 at 7:52 AM, Michael Vorburger <vorbur...@redhat.com>
>> wrote:
>>
>>> Hi guys,
>>>
>>> I just ran MAT([1]) over an HPROF heap dump on OOM in Bug 7370, and it
>>> (MAT) raises a "leak suspect" in akka.dispatch.Dispatcher - see the [3]
>>> java_pid19570_Leak_Suspects.zip just attached to Bug 7370 ... questions:
>>>
>>> Is this perhaps something you jump at with an "ah that, we know about it
>>> and already fixed that in ..." ?
>>>
>>> If not, how do we go about better understanding the root cause of this,
>>> and be able to eventually fix this?
>>>
>>> My underlying assumption here is that isn't "normal" and not just "by
>>> design" - if it is, I'd love some education... like I'm hoping that the
>>> conclusion here isn't simply that MD SAL's data store is a dumb in-memory
>>> data base which basically just takes a huge amount of GBs to keep (all)
>>> YANG model instances on the heap - or is it?
>>>
>>> Tx,
>>> M.
>>>
>>> [1] https://www.eclipse.org/mat/
>>>
>>> [2] https://bugs.opendaylight.org/show_bug.cgi?id=7370
>>>
>>> [3] https://bugs.opendaylight.org/attachment.cgi?id=1816
>>>
>>
> [4] https://bugs.opendaylight.org/attachment.cgi?id=1819
>
>
>> --
>>> Michael Vorburger, Red Hat
>>> vorbur...@redhat.com | IRC: vorburger @freenode | ~ =
>>> http://vorburger.ch
>>>
>>> _______________________________________________
>>> mdsal-dev mailing list
>>> mdsal-...@lists.opendaylight.org
>>> https://lists.opendaylight.org/mailman/listinfo/mdsal-dev
>>>
>>>
>>
>
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to