Hi all, I'm currently using Akka for building a scalable system for distributed rule evaluation. The focus is on throughput and scalability with respect to the number of nodes, and it's amazing how quickly Akka allows to set up a working and scalable distributed system.
However, I have hit some strange behaviour in terms of the latency of individual requests. As the graph below shows, the large majority of the requests is around 2-3 ms, which is great. 13% of the requests however make up another peak, this one around 44ms. <https://lh3.googleusercontent.com/-diTy0bf1O8k/VVDSHm_VCnI/AAAAAAAABh0/iGpPIc6hci8/s1600/summary.png> Since this amount skewes the final results of my performance tests, I'm trying to solve this issue, but have not found a solution yet and wanted to see whether any of you have an idea where to look. To give some more details: - The requests measured in the graph are actually the end-to-end latencies of 6 sequential messages between 3 different actors on 3 different JVMs located on a single physical node (my laptop). The same behavior occurs when deploying these JVMs on different physical machines. - The three nodes are a client and 2 cooperating coordinator nodes, resulting in a message flow as follows: A -> B -> C -> B -> C -> B -> A. - After measuring the latency of each individual message, it seems that the vast majority of the high end-to-end latencies are caused by the first message from B -> C, which takes up around 36ms then. In other words, the 13% is the result of around 13/6=2.2% of the messages showing a high latency. - The messages are Scala case classes for which I use the default serialized (no protobuf or JSON). - Remoting is done using Netty/TCP as shown in the examples on the website. - I have not configured any specific dispatchers in the conf. - The measurements in the graph shown above are taken sequentially, meaning that A makes a request, waits time for B to respond back to it after coordinating with C, and then sends another request. The last days I have tried multiple possible causes, that all failed to convince. - Garbage collection: there are a factor more high-latency requests than GCs shown in JVisualVM and their pause time is far below the 36ms. Also, adjusting the initial or maximum heap size does not solve the issue. - (De)serialization: I measured the serialization and deserialization time in EndpointWriter and DefaultMessageDispatcher in Endpoint.scala of akka-remote, but they never reached more than 1ms. As the timestamps showed, the latency is present between serialization in EndpointWriter and deserialization in DefaultMessageDispatcher. - TCP bundling (Nagle's algorithm): setting on tcp-no-deloy in the configuration does not fix the issue (and it is default on, no?). Strangely enough, the percentage of high-latency requests drops if I increase the parallel load on the system: for 10 parallel clients as above, it's only 2% of the requests, for 20 only 0.6%. So, for those who made it to the end of this lengthy e-mail: does anyone have a clue where I can continue searching for the issue? What happens between sending and receiving an Akka message? Is it possible that this has something to do with thread scheduling, for example that other threads get priority over the thread that is responsible for actually sending or receiving the data? Thanks in advance for any ideas. Kind regards, Maarten -- >>>>>>>>>> Read the docs: http://akka.io/docs/ >>>>>>>>>> Check the FAQ: >>>>>>>>>> http://doc.akka.io/docs/akka/current/additional/faq.html >>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user --- You received this message because you are subscribed to the Google Groups "Akka User List" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/akka-user. For more options, visit https://groups.google.com/d/optout.
