Hi, I am using a relatively large codebase ( distributed Raft protocol Stack and Datastore built atop basic Akka clustering) which uses Ask pattern for some of the key messages exchanged across the nodes . We frequently hit AskTimeout Exceptions (typically we have 30000 ms timeout) particularly when network partitions and heals.
Basics of my environment ==================== Akka version : 2.4.7 Environment : OSGi Clustering used ? : Yes Sharding used ? : No Cluster Size : 3 nodes in 3 VMs Any special features like Distributed Pub Sub used ? : No Ask timeouts : upto 30000ms Remoting : Netty tcp Auto-down after : Disabled Observations : =========== I did following checks to start basic troubleshooting all the way from Akka cluster layer for above issue a) Checking the Cluster MBean on all cluster nodes twice or thrice to see if the cluster heals properly after network split and heal (we do NOT use auto-down-after configuration) --> Akka Layer heals properly b) Checking if there is a heavy message-exchange between cluster nodes by monitoring 'netstat -anp | grep 2550' soon after network heal (monitoring SendQ and RecvQ for port 2550 and we use Netty transport - plan to move to Artery is on cards but not now) --> Connections are stable and there are message exchanges but not to the order of ones which can cause issues like head-of-line blocking or similar to have other side-effects c) Checking for Unreachable messages after the network is restored ---> No such conditions seen AFTER network is restored Since most of basic checks do not indicate any issues at the Akka layer, I am now trying to troubleshoot the application / stack built atop akka. Some key areas which I can suspect are a) Message processing latency by stack/applications on top of akka b) Above condition resulting in mailbox build-up So, I think, I can effectively start with the mailbox monitoring on the cluster nodes which receive the messages via remoting. Now, specific clarifications : ===================== a) Looked around for few options like Kamon and Cinnamon. But getting them work in my OSGi environment is a nightmare because of classloading complexity. So, wanted to check if good old toolbox like jmap historgram would be of use if I know the message class names to monitor mailbox build-up b) Apart from mailbox buildup on remote nodes, any other monitoring is recommended ? c) Apart from receive-side mailbox, should I monitor anything specifically on sender-side. I assume that this is not required because netstat on 2550 does not seem to indicate any issues around remoting per se Please let me know if I am missing any info which makes the observations and clarifications vague so that I can get back on the same Thanks in advance Regards Muthu -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
