First step is to use latest version. Preferably 2.4.14, but if you are stuck on 2.3.x it is 2.3.15. Updating to 2.4.x should be fairly easy, see migration guide in docs.
You need a version with this fix https://github.com/akka/akka/issues/13909 and there are many other bug fixes since 2.3.11 /Patrik fre 9 dec. 2016 kl. 22:58 skrev Justin du coeur <[email protected]>: Hmm. I'm not sufficiently expert in large-cluster behavior to guess about the problem, but note that you should *never* use auto-down-unreachable-after in production code. (I actually don't even recommend it in test code.) While I don't *think* it causes the problem you're describing, it can cause much more severe "split-brain" issues that can lead to data corruption. You're going to need to come up with a more nuanced approach to the problem of downing; I recommend reading the documentation sections on Downing <http://doc.akka.io/docs/akka/2.4.14/scala/cluster-usage.html#Downing> and Split Brain <http://doc.akka.io/docs/akka/akka-commercial-addons-1.0/scala/split-brain-resolver.html> -- it's important to get this stuff right to have a stable environment. On Fri, Dec 9, 2016 at 3:44 PM, Tyler Brummett <[email protected]> wrote: Hey Akka experts, I need your help! Currently my company is using Akka as a part of a partial CQRS pattern. We have service adapters that consume source system events in the form of JMS messages, while producing commands to be asynchronously distributed to our command service. Our command service consumes all of these messages asynchronously based on a given group ID, so that no two commands with the same group ID are being processed at the same time. We have designed an approach that allows us to have each deployable component in its own cluster and use a clusterClient to talk across clusters. Below is another diagram illustrating the service architecture with the Akka configuration reflecting separate clusters. [diagram] (see attached please) Errors we are seeing on appbox01: UI sends commands to command service 11/11/2016 09:48:46,056 INFO [AppClusterSystem-akka.actor.default-dispatcher-29] CommandHandlerActor - received master ack. 11/11/2016 09:48:52,045 INFO [AppClusterSystem-akka.actor.default-dispatcher-35] CommandHandlerActor work timeout. For commandX 11/11/2016 09:48:52,046 ERROR [tomcat-http--33] AppController - X update failed com.company.appA.package.AkkaWorkFailedException: Timeout for X Errors we are seeing on servicebox01: UI sends commands to command service 11/11/2016 09:48:46,715 WARN [CommandClusterSystem-akka.actor.default-dispatcher-2] ClusterStatusListenerActor - Problem has occurred associating local host: servicebox01.company.com and remote host: appbox01.company.com 11/11/2016 09:48:46,716 WARN [CommandClusterSystem-akka.actor.default-dispatcher-2] ClusterStatusListenerActor - Problem has occurred associating local host: servicebox01.company.com and remote host: appbox01.company.com 11/11/2016 09:48:46,716 WARN [CommandClusterSystem-akka.actor.default-dispatcher-2] Remoting - Tried to associate with unreachable remote address [akka.tcp:// [email protected]:12345]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.] We are interested in seeing this new implementation through and finding solutions where we can decouple our services and apps from one another as we move towards a micro-service architecture. So if you have suggestions/solutions, we are all ears! So the main question is: why are our nodes being quarantined? We have restarted nodes and stabalized the environment over and over, but the quarantine problem resurfaces after a few hours. Typically it's in a bad state by the next day. As part of this post I have provided our typical application.conf file for a given service, which corresponds with our new "separate cluster" implementation (diagram). Hopefully someone out there can help us shed some light to this problem. Please see the application.conf below. Thanks! ===================== application.conf ===================== # bulkhead workers my-worker-exec-dispatcher { type = Dispatcher executor = "fork-join-executor" fork-join-executor { parallelism-min = 2 parallelism-factor = 2.0 parallelism-max = 10 } throughput =1 } # dedicate resources to the master actor my-master-dispatcher { type = Dispatcher executor = "fork-join-executor" fork-join-executor { parallelism-min = 2 parallelism-factor = 2.0 parallelism-max = 10 } throughput =20 } akka { loggers = ["akka.event.slf4j.Slf4jLogger"] loglevel = "INFO" stdout-loglevel = "OFF" actor.provider = "akka.cluster.ClusterActorRefProvider" # Log the complete configuration at INFO level when the actor system is started. # This is useful when you are uncertain of what configuration is used. log-config-on-start = off remote { log-remote-lifecycle-events = off # If this is "on", Akka will log all outbound messages at DEBUG level, # if off then they are not logged log-sent-messages = off # If this is "on", Akka will log all inbound messages at DEBUG level, # if off then they are not logged log-received-messages = off netty.tcp { # hostname is injected programmatically in AppConfiguration. port = ${akka.node.port} send-buffer-size = 10240000b receive-buffer-size = 10240000b maximum-frame-size = 5120000b } } contrib { cluster { pub-sub { # How often the DistributedPubSubMediator should send out gossip information gossip-interval = 5s } } } cluster { # seed-nodes is injected programmatically # seed-nodes = [${akka.seed.nodes}] # 30 minute auto down for a crashed master # a long network outage requires restarting the cluster after 30 minutes auto-down-unreachable-after = 1800s roles = [${akka.cluster.roles}] } actor { bounded-mailbox { mailbox-type = "akka.dispatch.BoundedMailbox" mailbox-capacity = 3000 mailbox-push-timeout-time = 100ms } debug { # enable function of LoggingReceive, which is to log any received message at # DEBUG level receive = off # enable DEBUG logging of all AutoReceiveMessages (Kill, PoisonPill et.c.) autoreceive = off # enable DEBUG logging of actor lifecycle changes lifecycle = off # enable DEBUG logging of all LoggingFSMs for events, transitions and timers fsm = off # enable DEBUG logging of subscription changes on the eventStream event-stream = off } } } akka.extensions = ["akka.contrib.pattern.ClusterReceptionistExtension"] akka.contrib.cluster.receptionist { name = receptionist number-of-contacts = 3 response-tunnel-receive-timeout = 30s } akka.cluster.client { heartbeat-interval = 2s acceptable-heartbeat-pause = 10s buffer = 0 } -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout. -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout. -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
