Hi Wolfgang, Partik, Roland

I may be seeing a similar problem. I have spent some time, and I believe I 
can reproduce this most of the time. I start a 2 node cluster (separate 
JVM's), then periodically restart the second node usually within 5 attempts 
the ActorSystem on the first node is terminated, the process continues to 
run. The timing of the restart seems to be key, it has to be right after it 
is shutdown.

I was able strip back our code to just a plain ActorSystem start + 
Clustering, so I don't believe there is anything else we are doing that is 
causing the problem. I was also able to replicate the problem in an IDE, 
and do some debugging: It seems that the ActorSystem is being shutdown from 
Cluster.shutdown, which is in turn invoked from 
ClusterCoreSupervisor.postStop.

Full stack:

"ClusterSystem-akka.actor.default-dispatcher-16@2725" prio=5 tid=0x1b 
nid=NA runnable
  java.lang.Thread.State: RUNNABLE
  at akka.cluster.Cluster.shutdown(Cluster.scala:355)
  at akka.cluster.ClusterCoreSupervisor.postStop(ClusterDaemon.scala:206)
  at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
  at 
akka.cluster.ClusterCoreSupervisor.aroundPostStop(ClusterDaemon.scala:187)
  at 
akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
  at 
akka.actor.dungeon.FaultHandling$class.handleChildTerminated(FaultHandling.scala:292)
  at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:369)
  at 
akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(DeathWatch.scala:63)
  at akka.actor.ActorCell.watchedActorTerminated(ActorCell.scala:369)
  at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:455)
  at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
  at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)


Strangely, Cluster.shutdown does log that its shutting down:

private[cluster] def shutdown(): Unit = {
...
logInfo("Shutting down...")
...

but I never see this in the log!

I can replicate this in 2.3.9/2.3.8 but not 2.3.7 just as Wolfgang had 
observed.  

Regards

Elliott



On Thursday, 19 February 2015 12:08:26 UTC, Wolfgang Friedl wrote:
>
> Hi Roland!
>
> Its back again! Today while doing load testing we observed the 
> disappearance of the actor-system again!
>
> 12:36:32.127 INFO  [akka.tcp://
> [email protected]:3552/user/sharding/DispatcherShard/-4042992360321395532/$b]
>  
> <domain specifc log>
> 12:36:36.911 WARN  [] MongoDBSystem 
> Actor[akka://OurProgramm/user/MongoDB_default#-1135971146] stopped.
> 12:36:36.911 WARN  [] MongoDBSystem 
> Actor[akka://OurProgramm/user/MongoDB_meta#2057722171] stopped.
>
> For load testing we use gatling with our own gatling-plugin which is using 
> akka-remote as well as akka-http.
> Could it be that akka-http is causing this? Maybe because internally do 
> many actors are created by akka-stream? Just a wild guess?
>
> Regards
>
> Wolfgang
>
>
> Am Freitag, 13. Februar 2015 12:35:49 UTC+1 schrieb Wolfgang Friedl:
>>
>> Hi Roland!
>>
>> That's true that might be related to our problem. Now the only open 
>> question is what have caused the fatal error.
>> I' will keep you up to date once we observed the problem again. 
>>
>> Regards
>>
>> Wolfgang
>>
>>
>> Am Donnerstag, 12. Februar 2015 11:41:49 UTC+1 schrieb rkuhn:
>>>
>>> Hi Wolfgang,
>>>
>>> there is a current thread about handling fatal errors 
>>> <https://groups.google.com/d/msg/akka-user/TyWlxjNaBiQ/1mXbJM_Lt9UJ> which 
>>> might be related. In your configuration you have  "jvm-exit-on-fatal-error" 
>>> : "on" so in principle the JVM should stop, but what if your code 
>>> encounters a fatal error and the stopping somehow does not happen? That 
>>> would explain the observed behavior, though there still remains a mystery 
>>> to be solved.
>>>
>>> Regards,
>>>
>>> Roland
>>>
>>> 9 feb 2015 kl. 14:04 skrev Wolfgang Friedl <[email protected]>:
>>>
>>> Here the configuration!
>>>
>>> I removed those parts of the configuration which are customer 
>>> related.This is th configuration when I start our application locally. In 
>>> the past we only watched the problems on our customers environment.
>>>
>>> {
>>>     "akka" : {
>>>         "actor" : {
>>>             "creation-timeout" : "20s",
>>>             "debug" : {
>>>                 "autoreceive" : "off",
>>>                 "event-stream" : "off",
>>>                 "fsm" : "off",
>>>                 "lifecycle" : "off",
>>>                 "receive" : "off",
>>>                 "router-misconfiguration" : "off",
>>>                 "unhandled" : "off"
>>>             },
>>>             "default-dispatcher" : {
>>>                 "attempt-teamwork" : "on",
>>>                 "default-executor" : {
>>>                     "fallback" : "fork-join-executor"
>>>                 },
>>>                 "executor" : "default-executor",
>>>                 "fork-join-executor" : {
>>>                     "parallelism-factor" : 3,
>>>                     "parallelism-max" : 64,
>>>                     "parallelism-min" : 8
>>>                 },
>>>                 "mailbox-requirement" : "",
>>>                 "shutdown-timeout" : "1s",
>>>                 "thread-pool-executor" : {
>>>                     "allow-core-timeout" : "on",
>>>                     "core-pool-size-factor" : 3,
>>>                     "core-pool-size-max" : 64,
>>>                     "core-pool-size-min" : 8,
>>>                     "keep-alive-time" : "60s",
>>>                     "max-pool-size-factor" : 3,
>>>                     "max-pool-size-max" : 64,
>>>                     "max-pool-size-min" : 8,
>>>                     "task-queue-size" : -1,
>>>                     "task-queue-type" : "linked"
>>>                 },
>>>                 "throughput" : 5,
>>>                 "throughput-deadline-time" : "0ms",
>>>                 "type" : "Dispatcher"
>>>             },
>>>             "default-mailbox" : {
>>>                 "mailbox-capacity" : 1000,
>>>                 "mailbox-push-timeout-time" : "10s",
>>>                 "mailbox-type" : "akka.dispatch.UnboundedMailbox",
>>>                 "stash-capacity" : -1
>>>             },
>>>             "deployment" : {
>>>                 "Replaced"
>>>                 "default" : {
>>>                     "cluster" : {
>>>                         "allow-local-routees" : "on",
>>>                         "enabled" : "off",
>>>                         "max-nr-of-instances-per-node" : 1,
>>>                         "routees-path" : "",
>>>                         "use-role" : ""
>>>                     },
>>>                     "dispatcher" : "",
>>>                     "mailbox" : "",
>>>                     "metrics-selector" : "mix",
>>>                     "nr-of-instances" : 1,
>>>                     "remote" : "",
>>>                     "resizer" : {
>>>                         "backoff-rate" : 0.1,
>>>                         "backoff-threshold" : 0.3,
>>>                         "enabled" : "off",
>>>                         "lower-bound" : 1,
>>>                         "messages-per-resize" : 10,
>>>                         "pressure-threshold" : 1,
>>>                         "rampup-rate" : 0.2,
>>>                         "upper-bound" : 10
>>>                     },
>>>                     "routees" : {
>>>                         "paths" : []
>>>                     },
>>>                     "router" : "from-code",
>>>                     "tail-chopping-router" : {
>>>                         "interval" : "10 milliseconds"
>>>                     },
>>>                     "target" : {
>>>                         "nodes" : []
>>>                     },
>>>                     "virtual-nodes-factor" : 10,
>>>                     "within" : "5 seconds"
>>>                 }
>>>             },
>>>             "dsl" : {
>>>                 "default-timeout" : "5s",
>>>                 "inbox-size" : 1000
>>>             },
>>>             "guardian-supervisor-strategy" : 
>>> "akka.actor.DefaultSupervisorStrategy",
>>>             "mailbox" : {
>>>                 "bounded-deque-based" : {
>>>                     "mailbox-type" : 
>>> "akka.dispatch.BoundedDequeBasedMailbox"
>>>                 },
>>>                 "bounded-queue-based" : {
>>>                     "mailbox-type" : "akka.dispatch.BoundedMailbox"
>>>                 },
>>>                 "requirements" : {
>>>                     
>>> "akka.dispatch.BoundedDequeBasedMessageQueueSemantics" : 
>>> "akka.actor.mailbox.bounded-deque-based",
>>>                     "akka.dispatch.BoundedMessageQueueSemantics" : 
>>> "akka.actor.mailbox.bounded-queue-based",
>>>                     "akka.dispatch.DequeBasedMessageQueueSemantics" : 
>>> "akka.actor.mailbox.unbounded-deque-based",
>>>                     "akka.dispatch.MultipleConsumerSemantics" : 
>>> "akka.actor.mailbox.unbounded-queue-based",
>>>                     
>>> "akka.dispatch.UnboundedDequeBasedMessageQueueSemantics" : 
>>> "akka.actor.mailbox.unbounded-deque-based",
>>>                     "akka.dispatch.UnboundedMessageQueueSemantics" : 
>>> "akka.actor.mailbox.unbounded-queue-based"
>>>                 },
>>>                 "unbounded-deque-based" : {
>>>                     "mailbox-type" : 
>>> "akka.dispatch.UnboundedDequeBasedMailbox"
>>>                 },
>>>                 "unbounded-queue-based" : {
>>>                     "mailbox-type" : "akka.dispatch.UnboundedMailbox"
>>>                 }
>>>             },
>>>             "provider" : "akka.cluster.ClusterActorRefProvider",
>>>             "reaper-interval" : "5s",
>>>             "router" : {
>>>                 "type-mapping" : {
>>>                     "adaptive-group" : 
>>> "akka.cluster.routing.AdaptiveLoadBalancingGroup",
>>>                     "adaptive-pool" : 
>>> "akka.cluster.routing.AdaptiveLoadBalancingPool",
>>>                     "balancing-pool" : "akka.routing.BalancingPool",
>>>                     "broadcast-group" : "akka.routing.BroadcastGroup",
>>>                     "broadcast-pool" : "akka.routing.BroadcastPool",
>>>                     "consistent-hashing-group" : 
>>> "akka.routing.ConsistentHashingGroup",
>>>                     "consistent-hashing-pool" : 
>>> "akka.routing.ConsistentHashingPool",
>>>                     "from-code" : "akka.routing.NoRouter",
>>>                     "random-group" : "akka.routing.RandomGroup",
>>>                     "random-pool" : "akka.routing.RandomPool",
>>>                     "round-robin-group" : "akka.routing.RoundRobinGroup",
>>>                     "round-robin-pool" : "akka.routing.RoundRobinPool",
>>>                     "scatter-gather-group" : 
>>> "akka.routing.ScatterGatherFirstCompletedGroup",
>>>                     "scatter-gather-pool" : 
>>> "akka.routing.ScatterGatherFirstCompletedPool",
>>>                     "smallest-mailbox-pool" : 
>>> "akka.routing.SmallestMailboxPool",
>>>                     "tail-chopping-group" : 
>>> "akka.routing.TailChoppingGroup",
>>>                     "tail-chopping-pool" : 
>>> "akka.routing.TailChoppingPool"
>>>                 }
>>>             },
>>>             "serialization-bindings" : {
>>>                 "[B" : "bytes",
>>>                 "akka.actor.ActorSelectionMessage" : "akka-containers",
>>>                 "akka.cluster.ClusterMessage" : "akka-cluster",
>>>                 "akka.contrib.pattern.DistributedPubSubMessage" : 
>>> "akka-pubsub",
>>>                 "akka.persistence.serialization.Message" : 
>>> "akka-persistence-message",
>>>                 "akka.persistence.serialization.Snapshot" : 
>>> "akka-persistence-snapshot",
>>>                 "akka.remote.DaemonMsgCreate" : "daemon-create",
>>>                 "REPLACED"
>>>                 "com.google.protobuf.GeneratedMessage" : "proto",
>>>                 "java.io.Serializable" : "java"
>>>             },
>>>             "serialize-creators" : "off",
>>>             "serialize-messages" : "off",
>>>             "serializers" : {
>>>                 "akka-cluster" : 
>>> "akka.cluster.protobuf.ClusterMessageSerializer",
>>>                 "akka-containers" : 
>>> "akka.remote.serialization.MessageContainerSerializer",
>>>                 "akka-persistence-message" : 
>>> "akka.persistence.serialization.MessageSerializer",
>>>                 "akka-persistence-snapshot" : 
>>> "akka.persistence.serialization.SnapshotSerializer",
>>>                 "akka-pubsub" : 
>>> "akka.contrib.pattern.protobuf.DistributedPubSubMessageSerializer",
>>>                 "bytes" : "akka.serialization.ByteArraySerializer",
>>>                  "REPLACED"
>>>             },
>>>             "typed" : {
>>>                 "timeout" : "5s"
>>>             },
>>>             "unstarted-push-timeout" : "10s"
>>>         },
>>>         "agent" : {
>>>             "alter-off-dispatcher" : {
>>>                 "executor" : "thread-pool-executor",
>>>                 "type" : "PinnedDispatcher"
>>>             },
>>>             "send-off-dispatcher" : {
>>>                 "executor" : "thread-pool-executor",
>>>                 "type" : "PinnedDispatcher"
>>>             }
>>>         },
>>>         "cluster" : {
>>>             "auto-down" : "off",
>>>             "auto-down-unreachable-after" : "30s",
>>>             "failure-detector" : {
>>>                 "acceptable-heartbeat-pause" : "30 s",
>>>                 "expected-response-after" : "5 s",
>>>                 "heartbeat-interval" : "1 s",
>>>                 "implementation-class" : 
>>> "akka.remote.PhiAccrualFailureDetector",
>>>                 "max-sample-size" : 1000,
>>>                 "min-std-deviation" : "10 s",
>>>                 "monitored-by-nr-of-members" : 5,
>>>                 "threshold" : 12
>>>             },
>>>             "gossip-different-view-probability" : 0.8,
>>>             "gossip-interval" : "1s",
>>>             "gossip-time-to-live" : "2s",
>>>             "jmx" : {
>>>                 "enabled" : "on"
>>>             },
>>>             "leader-actions-interval" : "1s",
>>>             "log-info" : "on",
>>>             "metrics" : {
>>>                 "collect-interval" : "3s",
>>>                 "collector-class" : "akka.cluster.SigarMetricsCollector",
>>>                 "enabled" : "on",
>>>                 "gossip-interval" : "3s",
>>>                 "moving-average-half-life" : "12s"
>>>             },
>>>             "min-nr-of-members" : 1,
>>>             "periodic-tasks-initial-delay" : "1s",
>>>             "publish-stats-interval" : "off",
>>>             "reduce-gossip-different-view-probability" : 400,
>>>             "retry-unsuccessful-join-after" : "10s",
>>>             "role" : {},
>>>             "roles" : [],
>>>             "scheduler" : {
>>>                 "tick-duration" : "33ms",
>>>                 "ticks-per-wheel" : 512
>>>             },
>>>             "seed-node-timeout" : "5s",
>>>             "seed-nodes" : [
>>>                 "akka.tcp://[email protected]:3552"
>>>             ],
>>>             "unreachable-nodes-reaper-interval" : "1s",
>>>             "use-dispatcher" : "akka.dispatchers.cluster-dispatcher"
>>>         },
>>>         "contrib" : {
>>>             "cluster" : {
>>>                 "client" : {
>>>                     "mailbox" : {
>>>                         "mailbox-type" : 
>>> "akka.dispatch.UnboundedDequeBasedMailbox",
>>>                         "stash-capacity" : 1000
>>>                     }
>>>                 },
>>>                 "pub-sub" : {
>>>                     "gossip-interval" : "1s",
>>>                     "max-delta-elements" : 3000,
>>>                     "name" : "distributedPubSubMediator",
>>>                     "removed-time-to-live" : "120s",
>>>                     "role" : "",
>>>                     "routing-logic" : "random"
>>>                 },
>>>                 "receptionist" : {
>>>                     "name" : "receptionist",
>>>                     "number-of-contacts" : 3,
>>>                     "response-tunnel-receive-timeout" : "30s",
>>>                     "role" : ""
>>>                 },
>>>                 "sharding" : {
>>>                     "buffer-size" : 100000,
>>>                     "coordinator-failure-backoff" : "10 s",
>>>                     "guardian-name" : "sharding",
>>>                     "handoff-timeout" : "60 s",
>>>                     "least-shard-allocation-strategy" : {
>>>                         "max-simultaneous-rebalance" : 3,
>>>                         "rebalance-threshold" : 10
>>>                     },
>>>                     "rebalance-interval" : "10 days",
>>>                     "retry-interval" : "2 s",
>>>                     "role" : "",
>>>                     "snapshot-interval" : "3600 s"
>>>                 }
>>>             }
>>>         },
>>>         "daemonic" : "off",
>>>         "dispatchers" : {
>>>             "cluster-dispatcher" : {
>>>                 "executor" : "fork-join-executor",
>>>                 "fork-join-executor" : {
>>>                     "parallelism-max" : 4,
>>>                     "parallelism-min" : 2
>>>                 },
>>>                 "type" : "Dispatcher"
>>>             },
>>>             "comserv-dispatcher" : {
>>>                 "executor" : "thread-pool-executor",
>>>                 "type" : "PinnedDispatcher"
>>>             },
>>>             "fieldstreams-endpoint-dispatcher" : {
>>>                 "executor" : "fork-join-executor",
>>>                 "throughput" : 5,
>>>                 "type" : "Dispatcher"
>>>             },
>>>             "fieldstreams-journals-dispatcher" : {
>>>                 "executor" : "fork-join-executor",
>>>                 "fork-join-executor" : {
>>>                     "parallelism-max" : 2,
>>>                     "parallelism-min" : 2
>>>                 },
>>>                 "throughput" : 5,
>>>                 "type" : "Dispatcher"
>>>             },
>>>             "fieldstreams-meta-stream-dispatcher" : {
>>>                 "executor" : "fork-join-executor",
>>>                 "fork-join-executor" : {
>>>
>>> ...
>>
>>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to