First step is to use latest version. Preferably 2.4.14, but if you are
stuck on 2.3.x it is 2.3.15. Updating to 2.4.x should be fairly easy, see
migration guide in docs.

You need a version with this fix https://github.com/akka/akka/issues/13909
and there are many other bug fixes since 2.3.11

/Patrik

fre 9 dec. 2016 kl. 22:58 skrev Justin du coeur <[email protected]>:

Hmm.  I'm not sufficiently expert in large-cluster behavior to guess about
the problem, but note that you should *never* use
auto-down-unreachable-after in production code. (I actually don't even
recommend it in test code.)  While I don't *think* it causes the problem
you're describing, it can cause much more severe "split-brain" issues that
can lead to data corruption.  You're going to need to come up with a more
nuanced approach to the problem of downing; I recommend reading the
documentation sections on Downing
<http://doc.akka.io/docs/akka/2.4.14/scala/cluster-usage.html#Downing>
and Split
Brain
<http://doc.akka.io/docs/akka/akka-commercial-addons-1.0/scala/split-brain-resolver.html>
-- it's important to get this stuff right to have a stable environment.

On Fri, Dec 9, 2016 at 3:44 PM, Tyler Brummett <[email protected]>
wrote:

Hey Akka experts, I need your help! Currently my company is using Akka as a
part of a partial CQRS pattern. We have service adapters that consume
source system events in the form of JMS messages, while producing commands
to be asynchronously distributed to our command service. Our command
service consumes all of these messages asynchronously based on a given
group ID, so that no two commands with the same group ID are being
processed at the same time.

We have designed an approach that allows us to have each deployable
component in its own cluster and use a clusterClient to talk across
clusters. Below is another diagram illustrating the service architecture
with the Akka configuration reflecting separate clusters.

[diagram]
(see attached please)

Errors we are seeing on appbox01: UI sends commands to command service
11/11/2016 09:48:46,056  INFO
[AppClusterSystem-akka.actor.default-dispatcher-29] CommandHandlerActor -
received master ack.
11/11/2016 09:48:52,045  INFO
[AppClusterSystem-akka.actor.default-dispatcher-35] CommandHandlerActor
work timeout. For commandX
11/11/2016 09:48:52,046 ERROR [tomcat-http--33] AppController - X update
failed
com.company.appA.package.AkkaWorkFailedException: Timeout for X

Errors we are seeing on servicebox01: UI sends commands to command service
11/11/2016 09:48:46,715  WARN
[CommandClusterSystem-akka.actor.default-dispatcher-2]
ClusterStatusListenerActor - Problem has occurred associating local host:
servicebox01.company.com and remote host: appbox01.company.com
11/11/2016 09:48:46,716  WARN
[CommandClusterSystem-akka.actor.default-dispatcher-2]
ClusterStatusListenerActor - Problem has occurred associating local host:
servicebox01.company.com and remote host: appbox01.company.com
11/11/2016 09:48:46,716  WARN
[CommandClusterSystem-akka.actor.default-dispatcher-2] Remoting - Tried to
associate with unreachable remote address [akka.tcp://
[email protected]:12345]. Address is now gated for 5000
ms, all messages to this address will be delivered to dead letters. Reason:
[The remote system has quarantined this system. No further associations to
the remote system are possible until this system is restarted.]


We are interested in seeing this new implementation through and finding
solutions where we can decouple our services and apps from one another as
we move towards a micro-service architecture. So if you have
suggestions/solutions, we are all ears!

So the main question is: why are our nodes being quarantined? We have
restarted nodes and stabalized the environment over and over, but the
quarantine problem resurfaces after a few hours. Typically it's in a bad
state by the next day. As part of this post I have provided our typical
application.conf file for a given service, which corresponds with our new
"separate cluster" implementation (diagram). Hopefully someone out there
can help us shed some light to this problem. Please see the
application.conf below.

Thanks!

=====================
application.conf
=====================

# bulkhead workers
my-worker-exec-dispatcher {
   type = Dispatcher
   executor = "fork-join-executor"
   fork-join-executor {
      parallelism-min = 2
      parallelism-factor = 2.0
      parallelism-max = 10
   }
   throughput =1
}

# dedicate resources to the master actor
my-master-dispatcher {
   type = Dispatcher
   executor = "fork-join-executor"
   fork-join-executor {
      parallelism-min = 2
      parallelism-factor = 2.0
      parallelism-max = 10
   }
   throughput =20
}

akka {
   loggers = ["akka.event.slf4j.Slf4jLogger"]
   loglevel = "INFO"
   stdout-loglevel = "OFF"

   actor.provider = "akka.cluster.ClusterActorRefProvider"

  # Log the complete configuration at INFO level when the actor system is
started.
  # This is useful when you are uncertain of what configuration is used.
  log-config-on-start = off

   remote {
      log-remote-lifecycle-events = off

 # If this is "on", Akka will log all outbound messages at DEBUG level,
      # if off then they are not logged
      log-sent-messages = off
 # If this is "on", Akka will log all inbound messages at DEBUG level,
      # if off then they are not logged
      log-received-messages = off
      netty.tcp {
         # hostname is injected programmatically in AppConfiguration.
         port = ${akka.node.port}
         send-buffer-size = 10240000b
         receive-buffer-size = 10240000b
         maximum-frame-size = 5120000b
      }
   }

   contrib {
cluster {
 pub-sub {
# How often the DistributedPubSubMediator should send out gossip information
gossip-interval = 5s
 }
  }
   }

   cluster {
      # seed-nodes is injected programmatically
      # seed-nodes = [${akka.seed.nodes}]
      # 30 minute auto down for a crashed master
      # a long network outage requires restarting the cluster after 30
minutes
      auto-down-unreachable-after = 1800s
      roles = [${akka.cluster.roles}]
   }

   actor {
      bounded-mailbox {
         mailbox-type = "akka.dispatch.BoundedMailbox"
         mailbox-capacity = 3000
         mailbox-push-timeout-time = 100ms
      }

 debug {
      # enable function of LoggingReceive, which is to log any received
message at
      # DEBUG level
      receive = off
  # enable DEBUG logging of all AutoReceiveMessages (Kill, PoisonPill et.c.)
      autoreceive = off
 # enable DEBUG logging of actor lifecycle changes
      lifecycle = off
  # enable DEBUG logging of all LoggingFSMs for events, transitions and
timers
      fsm = off
  # enable DEBUG logging of subscription changes on the eventStream
      event-stream = off
    }
   }
}

akka.extensions = ["akka.contrib.pattern.ClusterReceptionistExtension"]

akka.contrib.cluster.receptionist {
   name = receptionist
   number-of-contacts = 3
   response-tunnel-receive-timeout = 30s
}

akka.cluster.client {
   heartbeat-interval = 2s
   acceptable-heartbeat-pause = 10s
   buffer = 0
}

-- 
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups
"Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.


-- 
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ:
http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups
"Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

-- 
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: 
>>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
--- 
You received this message because you are subscribed to the Google Groups "Akka 
User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

Reply via email to