Hi, We are using akka-remote and external Zookeeper instance to build a distributed system with partition logic.
We have 26 nodes with one ActorSystem on each node. The partition size is 9. Requests can go to any of 26 nodes. If the node is not in the partition, the request will be delegate to one of the partition member and the partition member will broadcast the message to all members inside the partition. Now we observe that Quarantine happens 1 or 2 times per day which is a fairly high frequency. In the system, we subscribe to the QuarantinedEvent and restart the JVM process if the event occurs. However, this might cause cascading Quarantine happening in the system. Usually, we see 3 or more nodes get restarted. In addition, the Quarantine doesn't actually happen during high traffic time. There is no obvious pattern shows that high traffic may cause the Quarantine. The network is supposed to be very stable. Would you please give me some suggestions on what might cause the Quarantine to happen so frequently? BTW, we are on akka 2.3.9 and trying to update to 2.3.10 but haven't tested it on 2.3.10. Thanks, Zhuchen -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
