Hi Elliott,

thanks for these additional details! We’ll look into this as soon as we can.

Regards,

Roland

> 20 feb 2015 kl. 16:10 skrev Elliott Miller <[email protected]>:
> 
> Hi Wolfgang, Partik, Roland
> 
> I may be seeing a similar problem. I have spent some time, and I believe I 
> can reproduce this most of the time. I start a 2 node cluster (separate 
> JVM's), then periodically restart the second node usually within 5 attempts 
> the ActorSystem on the first node is terminated, the process continues to 
> run. The timing of the restart seems to be key, it has to be right after it 
> is shutdown.
> 
> I was able strip back our code to just a plain ActorSystem start + 
> Clustering, so I don't believe there is anything else we are doing that is 
> causing the problem. I was also able to replicate the problem in an IDE, and 
> do some debugging: It seems that the ActorSystem is being shutdown from 
> Cluster.shutdown, which is in turn invoked from 
> ClusterCoreSupervisor.postStop.
> 
> Full stack:
> 
> "ClusterSystem-akka.actor.default-dispatcher-16@2725" prio=5 tid=0x1b nid=NA 
> runnable
>   java.lang.Thread.State: RUNNABLE
>         at akka.cluster.Cluster.shutdown(Cluster.scala:355)
>         at 
> akka.cluster.ClusterCoreSupervisor.postStop(ClusterDaemon.scala:206)
>         at akka.actor.Actor$class.aroundPostStop(Actor.scala:475)
>         at 
> akka.cluster.ClusterCoreSupervisor.aroundPostStop(ClusterDaemon.scala:187)
>         at 
> akka.actor.dungeon.FaultHandling$class.akka$actor$dungeon$FaultHandling$$finishTerminate(FaultHandling.scala:210)
>         at 
> akka.actor.dungeon.FaultHandling$class.handleChildTerminated(FaultHandling.scala:292)
>         at akka.actor.ActorCell.handleChildTerminated(ActorCell.scala:369)
>         at 
> akka.actor.dungeon.DeathWatch$class.watchedActorTerminated(DeathWatch.scala:63)
>         at akka.actor.ActorCell.watchedActorTerminated(ActorCell.scala:369)
>         at akka.actor.ActorCell.invokeAll$1(ActorCell.scala:455)
>         at akka.actor.ActorCell.systemInvoke(ActorCell.scala:478)
>         at akka.dispatch.Mailbox.processAllSystemMessages(Mailbox.scala:279)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>         at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
>         at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at 
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> 
> 
> Strangely, Cluster.shutdown does log that its shutting down:
> 
> private[cluster] def shutdown(): Unit = {
>       ...
>       logInfo("Shutting down...")
>       ...
> 
> but I never see this in the log!
> 
> I can replicate this in 2.3.9/2.3.8 but not 2.3.7 just as Wolfgang had 
> observed.        
> 
> Regards
> 
> Elliott
> 
> 
> 
> On Thursday, 19 February 2015 12:08:26 UTC, Wolfgang Friedl wrote:
> Hi Roland!
> 
> Its back again! Today while doing load testing we observed the disappearance 
> of the actor-system again!
> 
> 12:36:32.127 INFO  
> [akka.tcp://[email protected]:3552/user/sharding/DispatcherShard/-4042992360321395532/$b
>  
> <http://[email protected]:3552/user/sharding/DispatcherShard/-4042992360321395532/$b>]
>  <domain specifc log>
> 12:36:36.911 WARN  [] MongoDBSystem 
> Actor[akka://OurProgramm/user/MongoDB_default#-1135971146] stopped.
> 12:36:36.911 WARN  [] MongoDBSystem 
> Actor[akka://OurProgramm/user/MongoDB_meta#2057722171] stopped.
> 
> For load testing we use gatling with our own gatling-plugin which is using 
> akka-remote as well as akka-http.
> Could it be that akka-http is causing this? Maybe because internally do many 
> actors are created by akka-stream? Just a wild guess?
> 
> Regards
> 
> Wolfgang
> 
> 
> Am Freitag, 13. Februar 2015 12:35:49 UTC+1 schrieb Wolfgang Friedl:
> Hi Roland!
> 
> That's true that might be related to our problem. Now the only open question 
> is what have caused the fatal error.
> I' will keep you up to date once we observed the problem again. 
> 
> Regards
> 
> Wolfgang
> 
> 
> Am Donnerstag, 12. Februar 2015 11:41:49 UTC+1 schrieb rkuhn:
> Hi Wolfgang,
> 
> there is a current thread about handling fatal errors 
> <https://groups.google.com/d/msg/akka-user/TyWlxjNaBiQ/1mXbJM_Lt9UJ> which 
> might be related. In your configuration you have  "jvm-exit-on-fatal-error" : 
> "on" so in principle the JVM should stop, but what if your code encounters a 
> fatal error and the stopping somehow does not happen? That would explain the 
> observed behavior, though there still remains a mystery to be solved.
> 
> Regards,
> 
> Roland
> 
> 9 feb 2015 kl. 14:04 skrev Wolfgang Friedl <[email protected] <>>:
> 
> Here the configuration!
> 
> I removed those parts of the configuration which are customer related.This is 
> th configuration when I start our application locally. In the past we only 
> watched the problems on our customers environment.
> 
> {
>     "akka" : {
>         "actor" : {
>             "creation-timeout" : "20s",
>             "debug" : {
>                 "autoreceive" : "off",
>                 "event-stream" : "off",
>                 "fsm" : "off",
>                 "lifecycle" : "off",
>                 "receive" : "off",
>                 "router-misconfiguration" : "off",
>                 "unhandled" : "off"
>             },
>             "default-dispatcher" : {
>                 "attempt-teamwork" : "on",
>                 "default-executor" : {
>                     "fallback" : "fork-join-executor"
>                 },
>                 "executor" : "default-executor",
>                 "fork-join-executor" : {
>                     "parallelism-factor" : 3,
>                     "parallelism-max" : 64,
>                     "parallelism-min" : 8
>                 },
>                 "mailbox-requirement" : "",
>                 "shutdown-timeout" : "1s",
>                 "thread-pool-executor" : {
>                     "allow-core-timeout" : "on",
>                     "core-pool-size-factor" : 3,
>                     "core-pool-size-max" : 64,
>                     "core-pool-size-min" : 8,
>                     "keep-alive-time" : "60s",
>                     "max-pool-size-factor" : 3,
>                     "max-pool-size-max" : 64,
>                     "max-pool-size-min" : 8,
>                     "task-queue-size" : -1,
>                     "task-queue-type" : "linked"
>                 },
>                 "throughput" : 5,
>                 "throughput-deadline-time" : "0ms",
>                 "type" : "Dispatcher"
>             },
>             "default-mailbox" : {
>                 "mailbox-capacity" : 1000,
>                 "mailbox-push-timeout-time" : "10s",
>                 "mailbox-type" : "akka.dispatch.UnboundedMailbox",
>                 "stash-capacity" : -1
>             },
>             "deployment" : {
>                 "Replaced"
>                 "default" : {
>                     "cluster" : {
>                         "allow-local-routees" : "on",
>                         "enabled" : "off",
>                         "max-nr-of-instances-per-node" : 1,
>                         "routees-path" : "",
>                         "use-role" : ""
>                     },
>                     "dispatcher" : "",
>                     "mailbox" : "",
>                     "metrics-selector" : "mix",
>                     "nr-of-instances" : 1,
>                     "remote" : "",
>                     "resizer" : {
>                         "backoff-rate" : 0.1,
>                         "backoff-threshold" : 0.3,
>                         "enabled" : "off",
>                         "lower-bound" : 1,
>                         "messages-per-resize" : 10,
>                         "pressure-threshold" : 1,
>                         "rampup-rate" : 0.2,
>                         "upper-bound" : 10
>                     },
>                     "routees" : {
>                         "paths" : []
>                     },
>                     "router" : "from-code",
>                     "tail-chopping-router" : {
>                         "interval" : "10 milliseconds"
>                     },
>                     "target" : {
>                         "nodes" : []
>                     },
>                     "virtual-nodes-factor" : 10,
>                     "within" : "5 seconds"
>                 }
>             },
>             "dsl" : {
>                 "default-timeout" : "5s",
>                 "inbox-size" : 1000
>             },
>             "guardian-supervisor-strategy" : 
> "akka.actor.DefaultSupervisorStrategy",
>             "mailbox" : {
>                 "bounded-deque-based" : {
>                     "mailbox-type" : "akka.dispatch.BoundedDequeBasedMailbox"
>                 },
>                 "bounded-queue-based" : {
>                     "mailbox-type" : "akka.dispatch.BoundedMailbox"
>                 },
>                 "requirements" : {
>                     "akka.dispatch.BoundedDequeBasedMessageQueueSemantics" : 
> "akka.actor.mailbox.bounded-deque-based",
>                     "akka.dispatch.BoundedMessageQueueSemantics" : 
> "akka.actor.mailbox.bounded-queue-based",
>                     "akka.dispatch.DequeBasedMessageQueueSemantics" : 
> "akka.actor.mailbox.unbounded-deque-based",
>                     "akka.dispatch.MultipleConsumerSemantics" : 
> "akka.actor.mailbox.unbounded-queue-based",
>                     "akka.dispatch.UnboundedDequeBasedMessageQueueSemantics" 
> : "akka.actor.mailbox.unbounded-deque-based",
>                     "akka.dispatch.UnboundedMessageQueueSemantics" : 
> "akka.actor.mailbox.unbounded-queue-based"
>                 },
>                 "unbounded-deque-based" : {
>                     "mailbox-type" : 
> "akka.dispatch.UnboundedDequeBasedMailbox"
>                 },
>                 "unbounded-queue-based" : {
>                     "mailbox-type" : "akka.dispatch.UnboundedMailbox"
>                 }
>             },
>             "provider" : "akka.cluster.ClusterActorRefProvider",
>             "reaper-interval" : "5s",
>             "router" : {
>                 "type-mapping" : {
>                     "adaptive-group" : 
> "akka.cluster.routing.AdaptiveLoadBalancingGroup",
>                     "adaptive-pool" : 
> "akka.cluster.routing.AdaptiveLoadBalancingPool",
>                     "balancing-pool" : "akka.routing.BalancingPool",
>                     "broadcast-group" : "akka.routing.BroadcastGroup",
>                     "broadcast-pool" : "akka.routing.BroadcastPool",
>                     "consistent-hashing-group" : 
> "akka.routing.ConsistentHashingGroup",
>                     "consistent-hashing-pool" : 
> "akka.routing.ConsistentHashingPool",
>                     "from-code" : "akka.routing.NoRouter",
>                     "random-group" : "akka.routing.RandomGroup",
>                     "random-pool" : "akka.routing.RandomPool",
>                     "round-robin-group" : "akka.routing.RoundRobinGroup",
>                     "round-robin-pool" : "akka.routing.RoundRobinPool",
>                     "scatter-gather-group" : 
> "akka.routing.ScatterGatherFirstCompletedGroup",
>                     "scatter-gather-pool" : 
> "akka.routing.ScatterGatherFirstCompletedPool",
>                     "smallest-mailbox-pool" : 
> "akka.routing.SmallestMailboxPool",
>                     "tail-chopping-group" : "akka.routing.TailChoppingGroup",
>                     "tail-chopping-pool" : "akka.routing.TailChoppingPool"
>                 }
>             },
>             "serialization-bindings" : {
>                 "[B" : "bytes",
>                 "akka.actor.ActorSelectionMessage" : "akka-containers",
>                 "akka.cluster.ClusterMessage" : "akka-cluster",
>                 "akka.contrib.pattern.DistributedPubSubMessage" : 
> "akka-pubsub",
>                 "akka.persistence.serialization.Message" : 
> "akka-persistence-message",
>                 "akka.persistence.serialization.Snapshot" : 
> "akka-persistence-snapshot",
>                 "akka.remote.DaemonMsgCreate" : "daemon-create",
>                 "REPLACED"
>                 "com.google.protobuf.GeneratedMessage" : "proto",
>                 "java.io.Serializable" : "java"
>             },
>             "serialize-creators" : "off",
>             "serialize-messages" : "off",
>             "serializers" : {
>                 "akka-cluster" : 
> "akka.cluster.protobuf.ClusterMessageSerializer",
>                 "akka-containers" : 
> "akka.remote.serialization.MessageContainerSerializer",
>                 "akka-persistence-message" : 
> "akka.persistence.serialization.MessageSerializer",
>                 "akka-persistence-snapshot" : 
> "akka.persistence.serialization.SnapshotSerializer",
>                 "akka-pubsub" : 
> "akka.contrib.pattern.protobuf.DistributedPubSubMessageSerializer",
>                 "bytes" : "akka.serialization.ByteArraySerializer",
>                  "REPLACED"
>             },
>             "typed" : {
>                 "timeout" : "5s"
>             },
>             "unstarted-push-timeout" : "10s"
>         },
>         "agent" : {
>             "alter-off-dispatcher" : {
>                 "executor" : "thread-pool-executor",
>                 "type" : "PinnedDispatcher"
>             },
>             "send-off-dispatcher" : {
>                 "executor" : "thread-pool-executor",
>                 "type" : "PinnedDispatcher"
>             }
>         },
>         "cluster" : {
>             "auto-down" : "off",
>             "auto-down-unreachable-after" : "30s",
>             "failure-detector" : {
>                 "acceptable-heartbeat-pause" : "30 s",
>                 "expected-response-after" : "5 s",
>                 "heartbeat-interval" : "1 s",
>                 "implementation-class" : 
> "akka.remote.PhiAccrualFailureDetector",
>                 "max-sample-size" : 1000,
>                 "min-std-deviation" : "10 s",
>                 "monitored-by-nr-of-members" : 5,
>                 "threshold" : 12
>             },
>             "gossip-different-view-probability" : 0.8,
>             "gossip-interval" : "1s",
>             "gossip-time-to-live" : "2s",
>             "jmx" : {
>                 "enabled" : "on"
>             },
>             "leader-actions-interval" : "1s",
>             "log-info" : "on",
>             "metrics" : {
>                 "collect-interval" : "3s",
>                 "collector-class" : "akka.cluster.SigarMetricsCollector",
>                 "enabled" : "on",
>                 "gossip-interval" : "3s",
>                 "moving-average-half-life" : "12s"
>             },
>             "min-nr-of-members" : 1,
>             "periodic-tasks-initial-delay" : "1s",
>             "publish-stats-interval" : "off",
>             "reduce-gossip-different-view-probability" : 400,
>             "retry-unsuccessful-join-after" : "10s",
>             "role" : {},
>             "roles" : [],
>             "scheduler" : {
>                 "tick-duration" : "33ms",
>                 "ticks-per-wheel" : 512
>             },
>             "seed-node-timeout" : "5s",
>             "seed-nodes" : [
>                 "akka.tcp://[email protected]:3552 <>"
>             ],
>             "unreachable-nodes-reaper-interval" : "1s",
>             "use-dispatcher" : "akka.dispatchers.cluster-dispatcher"
>         },
>         "contrib" : {
>             "cluster" : {
>                 "client" : {
>                     "mailbox" : {
>                         "mailbox-type" : 
> "akka.dispatch.UnboundedDequeBasedMailbox",
>                         "stash-capacity" : 1000
>                     }
>                 },
>                 "pub-sub" : {
>                     "gossip-interval" : "1s",
>                     "max-delta-elements" : 3000,
>                     "name" : "distributedPubSubMediator",
>                     "removed-time-to-live" : "120s",
>                     "role" : "",
>                     "routing-logic" : "random"
>                 },
>                 "receptionist" : {
>                     "name" : "receptionist",
>                     "number-of-contacts" : 3,
>                     "response-tunnel-receive-timeout" : "30s",
>                     "role" : ""
>                 },
>                 "sharding" : {
>                     "buffer-size" : 100000,
>                     "coordinator-failure-backoff" : "10 s",
>                     "guardian-name" : "sharding",
>                     "handoff-timeout" : "60 s",
>                     "least-shard-allocation-strategy" : {
>                         "max-simultaneous-rebalance" : 3,
>                         "rebalance-threshold" : 10
>                     },
>                     "rebalance-interval" : "10 days",
>                     "retry-interval" : "2 s",
>                     "role" : "",
>                     "snapshot-interval" : "3600 s"
>                 }
>             }
>         },
>         "daemonic" : "off",
>         "dispatchers" : {
>             "cluster-dispatcher" : {
>                 "executor" : "fork-join-executor",
>                 "fork-join-executor" : {
>                     "parallelism-max" : 4,
>                     "parallelism-min" : 2
>                 },
>                 "type" : "Dispatcher"
>             },
>             "comserv-dispatcher" : {
>                 "executor" : "thread-pool-executor",
>                 "type" : "PinnedDispatcher"
>             },
>             "fieldstreams-endpoint-dispatcher" : {
>                 "executor" : "fork-join-executor",
>                 "throughput" : 5,
>                 "type" : "Dispatcher"
>             },
>             "fieldstreams-journals-dispatcher" : {
>                 "executor" : "fork-join-executor",
>                 "fork-join-executor" : {
>                     "parallelism-max" : 2,
>                     "parallelism-min" : 2
>                 },
>                 "throughput" : 5,
>                 "type" : "Dispatcher"
>             },
>             "fieldstreams-meta-stream-dispatcher" : {
>                 "executor" : "fork-join-executor",
>                 "fork-join-executor" : {
> ...
> 
> -- 
> >>>>>>>>>> Read the docs: http://akka.io/docs/ <http://akka.io/docs/>
> >>>>>>>>>> Check the FAQ: 
> >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html 
> >>>>>>>>>> <http://doc.akka.io/docs/akka/current/additional/faq.html>
> >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user 
> >>>>>>>>>> <https://groups.google.com/group/akka-user>
> --- 
> You received this message because you are subscribed to the Google Groups 
> "Akka User List" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at http://groups.google.com/group/akka-user 
> <http://groups.google.com/group/akka-user>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.



Dr. Roland Kuhn
Akka Tech Lead
Typesafe <http://typesafe.com/> – Reactive apps on the JVM.
twitter: @rolandkuhn
 <http://twitter.com/#!/rolandkuhn>

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to